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SOME NEW METHODS IN MATRIX CALCULATION 1 

By Harold Hotelling 
Columbia University 
I. Introduction 

1, The increased practical importance of matrix calculation. This paper will 
be concerned chiefly with methods of finding the inverse of a matrix, and of 
finding the latent roots and latent vectors, which are also known by a variety of 
other names associated with particular applications, such as principal axes in 
geometry and mechanics, and principal components in psychology. These two 
computational problems are of extremely wide application. The first is closely 
related to the solution of systems of linear equations, which wo shall also con¬ 
sider. In the method of least squares the solution of the normal equations is 
best carried out with the help of the inverse of the matrix of the coefficients, 
since at least some of the elements of this inverse matrix are needed in evaluating 
the results in terms of probability, a vitally necessary step, ami since the inverse 
matrix is useful also in various other ways, such as altering the set of predictors 
used in a regression equation. Modern statistics also utilizes quadratic, and 
bilinear forms such as the generalized Student ratio [15] for discriminating be¬ 
tween samples according to multiple variates instead of one only, the associated 
discriminant functions [10], the closely related figurative distance of Mahalano- 
bis, Bose and Roy [5] and the critical statistic in an investigation by Wald [2H] 
of the efficient classification of an individual into one of two groups. All these 
may be calculated very easily from the inverse of a matrix of sums of products, 
or of covariances or correlations, or from the principal components, Considera¬ 
tion of the relations between two sets of variates [18] may utilize both the in¬ 
verse of a matrix and a process resembling the calculation of principal compo¬ 
nents. Similar computational problems arise in applying to sets of numerous 
variates the contributions to multivariate statistical analysis of R. A. Fisher, 
S. S. Wilks, W. G. Madow, M. A Girshick, P, L. Hsu and M. S, Bartlett. 
Among the non-statistical applications of the inverse matrix and of latent roots 
and vectors are problems of dynamics, both in astronomy and in airplane design 
[12], the analysis of stresses and strains in structures [2(1, 27], and electrical 
engineering problems [24], 

Perhaps no objection to attempts at statistical inference is more common than 
that the variation of this or that relevant factor has been ignored, For example 
in dealing with time scries the need of allowing for trend and seasonal variation, 
perhaps by means of a sequence of orthogonal polynomials for trend and of 

1 Revision of a paper presented at the Symposium on Numerical Calculation held Her. 
28, 1941 in New York by the Institute of Mathematical Statistics and the American Kta- 
tistical Association with the cooperation of the Committee on Addresses in Applied Mathe¬ 
matics of the American Mathematical Society. For the program of the Symposium hoc 
the Annals of Mathematical Statistics for March, 1942, p, 103, 
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trigonometric functions for seasonal variation, is well recognized. It is indeed 
desirable to use regression equations with a liberal number of predictors to 
eliminate spurious influences, as well as to reduce the error variance, and likewise 
in other statistical methods. But the computational difficulties in the joint 
analysis of the desired number of variables have frequently seemed too formid¬ 
able. We shall see how efficient techniques, in conjunction with efficient 
machines, can go far to facilitate the use of an appropriate number of variables 
by reducing the labor to modest dimensions. 

While the rise of modern multivariate statistical theory has made available 
new exact tests of hypotheses in terms of probability over a wide range of cases 
in which multiple measurements are involved, such measurements have been 
accumulating on a large scale. In many psychological, anthropometric, astro¬ 
nomical, meteorological and economic fields, actual measurements are available 
on numbers of variates far greater than have been regarded as amenable, within 
practical limits, to adequate treatment by the numerical methods generally 
used. In some instances the number of cases in which complete sets of these 
variates are available is also large. The 1931 census of India included an ex¬ 
tensive sample in which fifty physical variates were measured for each individual. 
Karl J. Holzinger and his collaborators have worked out and circulated privately 
a complete matrix of correlations among 78 mental tests. Astronomers have 
indicated the desirability of a recalculation of the elements of the solar system 
by means of a gigantic least-square solution with 150 or more unknowns, at the 
same time deploring the seeming impossibility of this ever being carried out. 
To apply the methods of modern theoretical statistics to derive from such 
observations all the important information they contain is an enterprise whose 
feasibility depends on new numerical methods. 

The chief computational problems, apart from those of tabulating and provid¬ 
ing convenient approximations for the probability distributions, are (1) the cal¬ 
culation of the many sums of products of pairs of p variates when p is large, and 
(2) operations on the matrices of these sums of products such as finding the 
inverse and the principal components. The first problem, which in classical 
applications of the method of least squares to long series has seemed the heavier, 
has in a sense been solved by the use of punched cards, A card is used for each 
case, and all p variates are punched into it. By running the cards repeatedly 
through a machine wired at each run to select a particular pair of variates, 
multiply them together, and cumulate the products, this part of the work may 
be disposed of with great speed, The cost of the machines does at present limit 
the economical use of this method to rather large numbers, both of variates 
and of eases This limit has recently been pushed upward by the introduction 
of improved multiplying calculators, with high-speed automatic multiplication 
and squaring locks. But these mechanical advances, in combination with 
recent discoveries in statistical theory, the increasingly felt need to resort to 
numerous variates, and the actual existence in many cases of data on such mul¬ 
tiple variates, emphasize the need for rapid, economical and accurate calculations 
with matrices whose elements are sums of products. 
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Modern machine methods, especially those of the punchcd-card type, but 
also those using machines such as the Monroe, Marchant and Fridon, tend to 
reduce the work of formation of sums of products, in comparison with other 
operations, to such an extent as to enhance the relative value of methods in 
which such calculation of direct product-sums is important. Thus products of 
matrices are much simpler to compute than inverses, and positive than negative 
powers. Indeed, powers and products of matrices can he computed by means 
of punchcd-card machines, and for largo matrices this is doubtless the most 
efficient procedure now available, though considerable rewiring is needed. There 
is also a possibility, which does not seem too remote, of development of further 
devices to do this rewiring automatically. 

2. Iterative and direct methods. Partitioned matrices. In later sections 
we shall deal chiefly with certain iterative methods, giving particular attention 
to the neglected question of limits of error in stopping at any point, and con¬ 
sidering the rate of approach to the desired solution. For finding the roots of 
a matrix and the associated vectors, if the matrix has more than about four rows, 
it seems clear that an iterative method is the most economical of labor in all but 
very special cases. On the other hand the problems of solving systems of linear 
equations and finding the inverse of a matrix do not usually yield readily to 
iterative methods unless an approximation to the solution is available to begin 
with. This approximation is not necessarily a very close one, but must not he 
too wild. It may in some cases be obtained from a general knowledge of the. 
subject. 

The Mallock electrical device [22] is capable of solving almost instantaneously 
ten linear equations in ten unknowns with perhaps two significant digits ill each 
result, though this question of accuracy remains to be elucidated, The com¬ 
bination of this device with the iterative method of Section 7 below, and with 
the use of partitioning for matrices of more than ten rows, offers what seems at 
present the best hope for the systematic inversion of largo matrices. Since 
only one of the Mallock machines is in existence (it is in Cambridge, England), 
some adaptation of the Doolittle or related methods will ordinarily he used. 
By taking advantage of the possibilities in modern calculating machines of ac¬ 
cumulating products to reduce the amount of writing required in the Doolittle 
method, exceedingly compact and efficient methods have been developed for 
solving systems of linear equations and for evaluating inverse 1 matrices by Dwyer 
[7, 8,9], who utilized the earlier work of Waugh, Kurtz, Horst, Dunlap and Cure- 
ton cited by him, and for solving systems of linear equations, by Croat [0], 
Dwyer gives valuable bibliographies, 

By some of these methods, or from a general knowledge of the subject, one 
may well obtain approximate solutions correct to a very small nuinberof decimal 
places, and then by iteration get as many more places as are required, with 
labor far less than would be necessary to carry through from the beginning the 
requisite number of places. Further applications of iterative methods arise 
when a least-square solution is to be revised, either on account of new observa- 
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tions or because of errors discovered in the original observations or calculations. 
But however a least-square calculation or the evaluation of any inverse matrix 
begins, and whatever intermediate stops are taken, it seems advisable to ter¬ 
minate it with the method of Section 7. This combines a cheek on the previous 
work, at a labor cost equivalent merely to substituting the values found for the 
unknowns into the equations, with an improvement in accuracy and a useful 
limit of error for the unknowns. 

In the inversion of large matrices there arc important possibilities in the 
properties of partitioning. For example, a square matrix of 2p rows may be 
partioned into four square matrices a, b , c, d, of p rows, and written 

a b 

_c d_. 

If this is multiplied on the right by another partitioned square matrix of 2 p 
rows which may be written 

~A C~ 

B DJ, 

where A, B, C, D are square p-rowed matrices, the product 

aA + bB aC + bD~ 

_cA + dB cC + dD_ 

is identical with the result of partitioning the product of the two original 2 p- , 
rowed matrices. If the second is the inverse of the first, this product is the 
identical matrix. Consequently, if the first matrix is given, we have for de¬ 
termining its inverse the four matrix equations in A, B, C, D, 

aA + bB = 1 aC + bD = 0 

cA + dB = 0 cC +dD = 1, 

where 1 stands for the identical matrix of p rows and 0 for the p-rowed matrix 
consisting entirely of zeros. These equations may be solved just as in elementary 
algebra except that care must be used to perform matrix multiplications in cor¬ 
rect order. Thus 

A = {a - bd~ l c)~ l , B = -d~'cA 

D = (d - ca~‘b)~ l C ~ -oT l bD. 

These formulae^call for inversion of four p-rowed matrices, namely d, a — bd~ l c , 
a and d - ca b. Without changing the number of such inversions we may 
choose alternative sots of matrices to invert, with economy of labor in certain 
cases. For example, if b is easy to invert, we may use for D the expression 

D = b~ l aAbd~\ 
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The formulae and numerical woik are further simplified if the Riven matrix is 
symmetric. Other modes of partitioning are also possible, and may he valuable 
in various kinds of numerical work. Another method of obtaining the inverse 
of a matrix by partitioning is given by Frazer, Duncan and Collar [12, pp. 112“ 
118], who also g've an account of general properties of partitioned matrices, 
In the treatment of relations between two or more sets of variates [18, 31], 
partitioned matrices appear. 

The most efficient method of calculation of a function of a matrix will depend 
in part on what else is to be calculated. For example, if the latent roots and 
vectors are needed for any reason as well as the inverse of a matrix, it, is better 
to calculate the former first, and then the determination of the inverse matrix 
becomes a trivial task; but if the latent roots and vectors are not needed for some 
other purpose it is usually better not to calculate them but to use a more direct 
method to obtain the inverse. If in addition to the inverse the determinant is 
wanted, or many consecutive powers of a matrix, or if a matrix-multiplying 
machine considerably speedier than present procedures becomes available, a 
method [3] based on the Cayley-Hamilton theorem that a matrix satisfies its 
own characteristic equation may be recommended. 

Iterative methods have what Whittaker and Robinson [30] call the pleasing 
characteristic that mistakes do not necessarily spoil the whole calculation, 
but tend to be corrected at later stages. This of course does not mean that there 
is no penalty for mistakes. They have an obvious tendency to prolong the 
number of repetitions required, and if repeated at late stages may actually pre¬ 
vent realization of a substantially correct result. A less obvious consequence of 
mistakes near the termination of an iterative calculation is that they tend to 
vitiate any limits of error that may be derived, including those that will he found 
below. Great care should be used to insure accurate calculation especially in 
the last stages of any iterative process. 

To'insure accuracy even before the last stages, and therefore efficiency, a 
check column consisting of the sums of the elements in the rows of matrices 
multiplied and added together may well be carried along, In multiplying two 
matrices only the check column of the second factor is used; it is multiplied by 
each row of the first factor to obtain the check column for the product. A 
computer thoroughly experienced with matrix multiplication may dispense with 
the check column at all stages but the last of an iterative process, relying on the 
self-correcting property of the process. 

A simple but extremely valuable bit of equipment in matrix multiplication 
consists of two plain cards, with a re-entrant right angle cut out of one or both 
of them if symmetric matrices arc to be multiplied. In getting the element of 
the tth row and jth column of the product, the fth row of the first factor and the 
^th column of the second should be marked by a card beside, above, or below it, 
In writing a symmetric matrix it is convenient to omit the elements below the 
principal diagonal. The re-entrant right angle is then utilized to mark off the 
numbers belonging to a particular row. 
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A report [13] on certain iterative methods of solving linear and other equations 
and of calculating latent roots and vectors, with engineering applications, was 
published by R, von Mi,sen and H Gciringer in 192!) As part of a discussion 
of certain problems in psychology [l(i] the present author in 1933 described 
iterative processes both for solving systems of linear equations and for finding 
principal components, and later [17] showed how to accelerate convergence to 
principal components by repeatedly squaring the matrix. Further acceleration 
of convergence by other devices has been discovered by A. (1. Aitken [2]. Dr. 
Gciringer has also discussed a method of solution of equations involving iteration 
by small groups of unknowns [14] The method of Kelley and Salisbury [20] 
should be noted. It has been used extensively by psychologists, Definite 
limits of error and measures of rate of convergence for this method are missing. 
Certain other iterative methods will be discussed in later sections. It will ap¬ 
pear that the most-used methods are by no means the best. 

Questions regarding the probability of a matrix of covariances satisfying 
particular conditions of computational significance may in some eases be il¬ 
luminated with the help of the theory of the variates as a random sample of a 
larger aggregate. This theory was outlined in the latter part of the paper [1G]. 

II. Linear Equations and Inverse Matrices 

3. Accuracy of direct solution of linear equations. The question how many 
decimal places should be retained in the various stages of a least-square solution 
and of other calculations involving linear equations has been a puzzling one, 
It has not generally been realized how rapidly errors resulting from rounding 
may accumulate in the successive steps of such procedures as, for example, the 
Doolittle method. In this popular algorism for solving a system of equations 

V 

few 9' (* = 1 , ••• ,v), 

the equivalent of successive eliminations of x x , x 2 , ■ • , to obtain an equa¬ 
tion in x p alone is accomplished by calculating successively 

a„ , = a tj - a d a u /a n , - a^/an (i, j = 2, 3, • • ■ , p), 

then 


flij 12 — — 0,2 l(?2/ 1/O22.1 , 




and so forth. Let us suppose that each of the a, fa and g t ’a j 8 subject to an 
erroj concerning which it is known only that its absolute value does not exceed 

0-/2 l y are , g ‘? n acc T tely t0 k decimal Plwes only, we have , = 
0 2. Let the actual errors be represented by 5a,, and Sg, . If these arc 

small an estimate of the error in g ,, may be obtained by expanding in a Taylor 
series and retaining only the linear terms: g y 
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5 < 7 , i = Sg, — — Sgi — — 5a,i + - 1 - 1 / 1 • 

*11 *11 011 

The closest upper bound for this error obtainable without special assumptions 
regarding the values of the given quantities is specified by the inequality 



The a’a and g’a are often correlation coefficients. Any set of normal equations 
of least squares may be reduced to a form in which this is the case, and this re¬ 
duction has considerable merits. The various correlation coefficients arc fre¬ 
quently of interest in themselves, and their use in the normal equations 
practically insures that all the quantities appearing at any stage are of the same 
order of magnitude. This last is a very substantial advantage, partly because 
of the check column which is customarily carried along, in which each entry is 
the sum of the other entries in its row. Since the absolute value of a correlation 
coefficient is less than unity, and since a„- becomes equal to unity, the last in¬ 
equality gives in this case 

| Sg { .i J < 4e, 

and no closer inequality appears possible. In the same way we find for this 
case in which the a’s are correlation coefficients that 

| 5a ,j 1 1 <[ 4e. 

Proceeding from these inequalities in the same way, and neglecting the fact 
that | 022 . 11 < 1 though like a a it is put equal to unity in the argument, wo find 
for the errors in 12 and g,. 12 the estimated upper bound 1G«, with an actual 
upper bound somewhat higher unless ai 2 = 0. Continuing in the same way we 
find for aiy.w. („_i) and 0 ,\ 12 ..(p— 1 > the estimated limit of error 4 p ~ 1 e, with a pos¬ 
sibility of a somewhat higher value up to 4 v ~ l t/a, where a is the determinant 
| aq | < 1. The rapidity with which this increases with p is a caution against 
relying on the results of the Doolittle method or other similar elimination 
methods with any moderate number of decimal places when the number of 
equations and unknowns is at all large. Thus if p = 11 the limit of error exceeds 
a million times «, indicating that if only one decimal place is wanted in the value 
of x p the original correlations must be utilized to at least seven decimals, even 
if we neglect the additional errors introduced l>y dropping decimals beyond 
those retained in the intermediate stages of the calculation. The errors ac¬ 
cumulate further during the back solution, so that if all the unknowns are 
wanted with one-place accuracy it is necessary to use the original correlations 
with substantially more than seven decimal places. For larger values of p the 
increase in the error limit is startling. Thus for p = 27 (the number of tests 
reported to be involved in a certain current procedure in classifying military 
personnel) the limit of error even for the first unknown evaluated is 4 s# t, repre- 
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senting a loss of about 16 decimal places of accuracy, while the correlations in 
Holzinger’s 78-rowed matrix would need to be carried to no loss than 46 places 
to insure even an approximate accuracy in the first decimal place of one of the 
regression coefficients in a formula derived by least squares for predicting one of 
his variates in terms of all the others. 

These high limits of error may possibly be reduced in the following ways: 
(a) a more exact study of the error might be made by mean* of terms of the 
Taylor series of orders higher than the first; (b) the positive definite character 
of a correlation matrix (or other matrix of normal equations) might be utilized 
in an attempt to arrive at lower limits of error; (c) instead of considering the 
maximum possible error we might depend on some mutual cancellation of dif¬ 
ferent errors and content ourselves with statements in terms of probability. 
The compounding of different errors of rounding, which may individually be 
regarded as having a probability distribution of uniform density over a fixed 
range, quickly gives rise to an almost exactly normal distribution of known 
mean and variance, so that the probability approach is attractive. However 
the limits of error obtained in this way with, for example, a five per cent level of 
probability of a greater error, though somewhat smaller than the limits asso¬ 
ciated with certainty, are disappointingly large. Investigations of the types 
(a) and (b) have not been made, they would apparently be very (Cumbersome, 
and (a) might have the effect of increasing the error limits considered above 
instead of cutting them down. Use of the check column does not provide any 
safeguard against the errors of rounding appearing in the original correlations, 
though from the probability standpoint, a carefully devised use of the check 
column may mitigate the accumulation of errors in successive stages. 

To .control such errors reliance is often placed in a substitution of the solution 
obtained in the given equations This is not completely satisfactory, since 
under some circumstances large errors in the Solution may yield only slight 
deviations of the left from the right members of the equations, and since some 
deviations must be expected in any case in which only a limited number of 
decimals is carried along Moreover this substitution, even if it reveals the 
existence of errors, does not usually make clear at once what should be done 
about them. A recalculation to a larger number of decimal places is horribly 
laborious There is here a distinct need of using an iterative process for im¬ 
proving on the solution obtained, and setting definite limits for the errors, 

4. The classical iterative method. The iterative method which seems to be 
the oldest and the most used for solving systems of linear equations, and which 
may like all other methods of doing this be applied to find the inverse of a 
matrix, is that of Gauss and Seidel. It seems also to be used in the “method 
of relaxations” [26], which has been recommended to engineers but lacks limits 
of error and measures of rate of convergence. 

This classical method, starting with any assumed values for the unknowns 
begins by changing the value for the first unknown so as to satisfy the first 
equation; this is possible if the coefficient is different from zero. The revised 
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set of trial values is then further altered by changing the second unknown so as 
to satisfy the second equation. Then the third unknown is altered so that the 
third equation will be satisfied, and so forth, When all the unknowns have 
been thus altered the cycle may be begun again, and repeated until the differ¬ 
ences between consecutive values of each unknown become small enough to 
indicate a satisfactory convergence. The method converges if the matrix A of 
the coefficients a,, is positive definite, as it is for the normal equations of least, 
squares, and also in certain engineering applications [7, 8, 9], Moreover t la- 
character of being positive definite insures that each a n differs from zero, so that 
the successive adjustments indicated are all actually possible. In the published 
discussions, proofs of convergence have sometimes been omitted, and in some 
cases (e.g. [30], Sec. 130) the proofs are incomplete. Even the fuller proofs 
[13] and [16] fail to give explicit limits for the errors in stopping at any particular 
stage. But from the discussion [16, pp. 602, 50-1] it is easy to see that positive 
numbers d, and k exist, with k < 1, such that the error in the with estimate of x< 
is less than d t k m . This limit of error diminishes in geometric progression with 
successive iterations; hence the number of decimal places of accuracy increases 
approximately in aiithmetic progression. The progression is however irregular 
and the trial values may fluctuate considerably. Numerical determination of 
limits of error does not appear to be easy. Experience with the method indicates 
that it is satisfactory only in case a really good approximation is available to 
begin with, in spite of its universal convergence. 

6. An acceleration and extension of the classical iteration. This classical 
scheme may be improved in the following way if nupieroils cycles of revision 
of the trial values are expected to be needed for the requisite accuracy. The 
first step, consisting of replacing the trial value Xi by 

x'i = (ffi — a 12 x 2 — - • • — a lp x f )/a n 

and leaving x 2 , ■ • , x p unchanged, amounts to subjecting the p + 1 variables 
#o, i ■ ■ ■ , to the homogeneous transformation 
/ 

%0 « t /0 

*1 = (diXa — a n x 2 — ... — aipZjJ/au 
x 2 = X 2 


where the symbol Xo > introduced for convenience in order to make these equations 
homogeneous, is always equal to unity. The matrix df the transformation, 


1 

0 

0 

0 

0 ' 

ffl/ail 

0 

— an/an 

— aia/ciu 

~a lp /t J J( 

0 

0 

•1 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 
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is of course singular. If X a denote the one-column, (p + l)-rowed matrix of 
the initial trial values, with unity at the head of the column, the column matrix 
X x = l\X a is the result of this first operation, again with unity at the head of 
the column. The trial values obtained by the second operation appear likewise 


in the column matrix X 2 

= T 2 X x = 

T2T1X0 , where 



1 

0 0 

0 

0 

0 


0 

1 0 

0 

0 

0 


q<i/ an ~ 

& 2 l/ &22 0 

— 023/ 022 

— £24/^22 

— ®2p/ O22 

1 2 — 

0 

0 0 

1 

0 

0 


0 

0 0 

0 

0 

1 


The result of a complete cycle of substitutions may be written X p = 
TpTj-i ■ TiT-iXa , where the matrices 7\ are of the same simple character 
illustrated by T x and Ti . This same result will be obtained, because of the as¬ 
sociative law of matrix multiplication, if we first calculate numerically the 
matrix 

T = T p Tj_i ■ • - T 2 Ti 

and then X p = TX 0 . (Experience shows that computers need at this point the 
caution that the matrices must be arranged m their proper order. A good pro¬ 
cedure is first to form T 2 Ti , then to multiply this by T 3 on the left, etc.). This 
requires rather more work than the original Gauss-Seidel scheme, and therefore 
is not worth while if only one cycle of substitutions is needed. 

The advantage lies m the fact that T may readily be squared, and T 2 X 0 gives 
a result equivalent to that of two full cycles of iteration by the Gauss-Seidel 
method. Furthermore, T" may be squared to give T*, which may also be 
squared, and so on. Obviously k such squarings give a matrix which, when mul¬ 
tiplied by Z 0 , yields the same result as 2* complete cycles of the original sub¬ 
stitutions. In terms of the number k of squarings the number of decimal places 
of accuracy tends to increase in geometrical instead of arithmetic progression. 
This modification of the classical method does not seem to have been published 
heretofore, though both it and the method of Section 7 have been in use by the 
author and his students since 1936, 

R. A. Fisher [11, Sec. 29] has introduced the valuable method of finding the 
inverse of a matrix A by solving together p systems, each of p equations in p 
unknowns, with the same matrix A of coefficients, but different columns of 
unknowns; these several columns of unknowns are the elements of the identical 
matrix. The technique of carrying this out by any of the methods resembling 
that of Doolittle is a simple extension involving replacement of the right-hand 
members of the equations by l’s and 0’s and carrying along p such columns 
instead of one while applying exactly the same linear operations to the rows as 
in the older problem. This, like the problem of solving linear equations, has 
been elegantly adapted to efficient calculation with modern machines by Dwyer 
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[7, 8, 9]. The foregoing iterative methods may also be applied in this ease, hut 
the matrix T will be different foi the different columns When the given matrix 
is symmetric (as is implied by the positive definite character assumed ni 
the proofs of convergence) the number of iterations required is generally cut 
down because the determination of each column determines also the elements 
of the corresponding row which lie in other columns. Iteration by groups 
[14] may well have a place here. 

An observation of A. C. Aitken’s [1] is noteworthy in connection with the 
solution of equations with a lion-symmetric matrix, and with the finding of the 
inverse of such a matrix. Writing the equations in the matrix form AX (V, 
we see that the solution X — A~ l G is also the solution of the system (A 'A)X » 
A'G, where A' is the transverse (also called the transpose or conjugate) of A, 
Evidently A'A and A'G can be formed by direct multiplications and additions, 
without divisions. Since A'A is symmetric, any of the methods for solving 
symmetric equations are applicable to the new system. To find the inverse of A 
we may first find the inverse of the symmetric matrix A'A and then postriiulltplv 
it by A'; for (A'A)' 1 A' = A~\ 

6 , Roots, norms and convergence of matrices. The, norm, of a matrix A may 
be defined as the square root of the sum of the products of its elements by their 
complex conjugates, and denoted by N(A), If A is real and a,-, is the element 
in the ith row and j\h column, 

(6.1) A(A) = Vi’sojy. 

This is the same function which Wedderburn [29, p. 125] defines as the absolute 
value of A and denotes by A with a heavy vertical bar on each aide. Since it 
is rather troublesome to avoid confusing this witii the determinant of A, we use 
the notation A(A), though the analogy with the ordinary absolute value of a 
quantity is very suggestive in connection with proofs of convergence and limits 
of error obtained by means of the “triangular inequalities" below. Rella [25] 
gives a different definition of the absolute value of the matrix as the maximum 
of the absolute values of its roots. 

The triangular inequalities, whose proof is easy with the help of the Cauchy 
inequality, are: 

(6-2) N(A + B) < N{A ) + N(D), 

(6-3) N(AB) < N(A)N(B). 

From the last it follows that for any positive integer m, 

(6.4) N(A m ) < [jV(A))"\ 

Hence if N(A ) < 1, the limit of N(A m ) as m increases is zero. It then follows 
that the limit of A itself is zero, i.c. that each of its elements approaches zero, 
because of the definition of the norm. 

The identical matrix of p rows, which we shall denote simply by 1, has the 
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norm \/ p, while, a scalar matrix k (i e one with the quantity fc in each element 
of the principal diagonal and zeros elsewhere) has the norm k \/ p. The norm 
of a p-rowed orthogonal matrix is -v/p. 

The roots of a square matrix, also known as the latent roots or oharaed eristic 
roots, are the values X 1 , • • , X p of X for which the determinant obtained by 
subtracting X from each clement of the principal diagonal vanishes, By expand¬ 
ing this determinant in powers of X and using a relation between roots and coeffi¬ 
cients of an equation, it is evident that the sum of the roots equals the sum of 
the elements m the principal diagonal. This sum is known as the trace of the 
matrix and denoted by tr(A). Thus 

(6-5) Xj -(- Xj -f- • ■ • -f- X p = tr(A). 

From the definitions of the transverse and norm of A it is plain that 

( 6 . 6 ) [1ST (A )] 2 = tr(AA') 
if A is real. 

If f{x) is any polynomial in x, /(A) is a matrix whose, roots are known [29, 
p 30] to be/(X,), (1 = 1 , 2, ■ • • , p). In particular, the loots of A'" are X( n . 
Consequently 

(6.7) X? + X? + ••• + X’p" = tr(A m ). 

All the roots of a zero matrix arc zero. But the fact that all the roots of a 
matrix are zero does not necessarily imply that the matrix is zero; for example 
the roots of 


( 6 . 8 ) 



are both zero But for real symmetric matrices the vanishing of all the roots 
does imply the vanishing of the matrix; for the sum of the squares of the elements 
of a symmetric matrix equals the sum of squares of the roots, since A = A' 
and by (6.7), (6.6) and (6 1), 

2 X? = tr(A 2 ) = tr(AA') = [IV(A)] 2 = 22rq,. 


Moreover, by continuity considerations, a sequence of p-rowed symmetric 
matrices must approach zero if all tho roots approach zero, and conversely. 

From this it is evident that a necessary and sufficient condition that A m 
approach zero as m increases, when A is symmetric, is that all tho roots of A 
ie less than unity. This provides a sharper criterion of convergence than the 
requirement that N(A) < l, w hi c h i s sufficient but not necessary for conver- 
gencc The latter is however far easier to apply in most numerical work, since 
it is far easier to compute N(A) than the greatest root. Moreover if, is easy to 

that n by P ffil)^TfAl m Va , rbUS WayS ’ ° f WMch the cmdpst is ^tioc 

atl by ((l 1} > cannot exceed p times the greatest absolute value, of any 
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element of A. Also, the test in terms of the norm is applicable to asymmetric 
as well as symmetric matrices. 

From these considerations regarding the convergence of A m we. deduce at once 
the following result. If the norm of a square matrix is less than unity, then all 
the roots are less than unity in absolute value. The converse is not true, as the 
example ( 6 . 8 ) shows. 

For any real square matrix A, symmetric or not, 

(6.9) X? + *i + ••• + X s , < [W(A)] a . 

To prove this, we observe first that 2a, ya^ < ajy + a 2 ,, and consequently 
tr(A 2 ) < tr(AA'). From (6.7) and ( 6 . 6 ) we then have iiX( < tr(AA') = 
[A(A)] 2 . This reasoning shows incidentally that 2X? is real, though the indi¬ 
vidual roots may be complex. 

Not only for investigating convergence, but also in the important but neg¬ 
lected problems of setting definite limits of error after a finite number of steps, 
the norm is an extremely useful function. If a matrix is to be computed with 
such accuracy that the error in each element is less than 5, and A is the matrix 
of errors, the requisite accuracy will according to (G.l) be attained when N(A) 
< 8. The definition and theoiems regarding the norm arc valid without any 
restriction to square matrices, for which alone the roots are defined. For 
example, we may use the nonn to derive an inequality concerning the solution 
of the system of p linear equations 

2 aijX S = g , 

which may be written in matrix form AX = G, whore A is a square matrix and 
X and G are matrices each of one column and p rows. From ((! 3) we find 
N{G) < N(A)N(X), whence 

N(X) > N{G)/N{A). 

We shall now deduce a result which seems to be new to matrix theory and 
which we shall later apply to find limits of error. If A is any matrix such that 
1 — A is non-singular the identity 

(1 - A ) -1 = 1 + A + A 2 + • • • + A m ~ l + A m (l - A )" 1 

holds, and may be demonstrated exactly as if A we,re an ordinary scalar quantity. 
Suppose that N (A) < k < 1 . Taking the norm and using ( 6 . 2 ), (6.3) and 
(6.4), we have 

N[( 1 - A)" 1 ] < p llt + k + fc 2 • • • + fc m ~ l + k m N[{ 1 - A)''). 

Since k < 1 we may solve for jV[( 1 — A)" 1 ]. Summing the. geometric progres¬ 
sion, we obtain: 



_1 _ 

X - k' 
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This holds for every positive integral value of m, and therefore in the limit when 
m becomes infinite. Thus we find that 

(6.10) N[(X - A)~ l ] < p 1,2 - 1 + Y~~k 

whenever N(A ) < k < 1. 

7. An efficient inversion procedure. Let Go be an approximation to the 
inverse of a matrix A, and consider the following sequence of operations. Cal¬ 
culate 

(7.1) Ci = C 0 (2 - ACa), 
and then in turn Ci ,C a , ■ ■ ■ where 

(7.2) C m+l = C m {2 - AC n ). 

Let us inquire as to the conditions under which the sequence of matrices C m 
converges to A~ x , the maximum error that may be committed in stopping at 
any stage, and the rate of convergence. Suppose that C 0 is a good enough 
approximation to A" 1 to make the roots of the matrix 

(7.3) D = 1 - ACo 

all less than unity in absolute value. Then increasing powers of D approach 
zero, and the convergence of C m to A~* will follow from the relation 

(7.4) C, = A~\l - D 2 ”), 

which will now be proved by mathematical induction. From (7,1) and (7.3), 

Cl = A~\AC 0 )( 1 + D) = A' 1 (l - Z>)(1 + D) = A~\ 1 - D 2 ), 

so that (7 4) is verified for m = 1. Now assume (7.4) for a particular value 
of m, and substitute it in (7.2), This gives 

C m+1 = A~\l - D im )( 1 + D 2m ) = A~\ 1 - D im+1 ), 

which being of the same form as (7.4) completes the induction. 

If N ( D ) < k < 1 the roots of D are all less than unity in absolute value, as 
shown in Sec. 6, and the foregoing result holds. Assuming this to be true we 
now derive an upper bound for the error in C m in terms of k and N(Go), Ac¬ 
cording to (7.3), 

AT X = <7 0 (1 - D)~\ 

Hence, by (7.4), 

C m - A- 1 = = — <7o(l - DT 1 D* m . 

Therefore, by (6.3), (6.4) and (6.10), 
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This sets an upper bound for the difference between eaeh element, of C m and the 
corresponding element of A~\ A slightly looser but simpler limit may he 
obtained from this in terms of the greatest absolute value c of any element of 
Co. Since N (C 0 ) < cp, 

(7.6) N(C m - A" 1 ) < k in cp (p m - 1 + j !_ fc ) ■ 

The great value of this method, whenever a good enough initial approximation 
is available to make N(D) less than unity, is that the number of decimal places 
of sure accuracy increases in geometric progression, rather than in arithmetic 
progression as with the usual methods. Consequently this method will always 
be the most efficient if a sufficiently large number of decimal places is required. 
Moreover, a limit can be set in advance for the number of iterations that will be 
required in order to insure any required degree of accuracy. If certainty of 
correctness in the sth decimal place is required we may choose m so that the 
right-hand member of (7 6) is less than ICf */2. In terms of logarithms to the 
base 10 the number of decimal places whose accuracy is assured by m iterations 
is thus at least 


(7.7) 2 m J log k | - log 2 - log cp[p m - 1 + (1 - A;)" 1 ]. 

These limits of erroi can be bettered after some iterations have actually been 
made. When C, becomes available we may calculate k r ~ JV(1 — Af' r ), which 
may be used in place of fc in the formulae just derived if m is replaced by m — r, 
and is generally enough smaller than fc to make a marked improvement. 

The elements of the matrix of errors will actually, of course, be smaller than 
the norm of this matrix in every practical case, in a ratio fluctuating about p“\ 
The limits obtained by our formulae can be reached only in ease the entire error 
of the matrix G m is concentrated in one element, a very unlikely event. Thus 
the limits given above will usually be quite conservative. 

As the iteration proceeds the elements of the matrix = 1 ~ AC m «=> D im 
will dimmish rapidly in case of convergence. For this reason it may sometimes 
be better to calculate C m+1 not directly from (7,2), but from the formula 

< 7 - 8 ) C m+ 1 == C m + C m D m 


m which the last term can be regarded as a correction of C n which will often be 
very small. This method, however, lacks the self-checking feature, so that 
its use at the final stage is dubious. 

This iterative process has been noticed previously [12, p. 120], but without a 

limit of error or observation of the geometric, progression in the number of 
accurate digits. 


If the initial approximation is not good enough to make N(D ) < 1, it mav 
e unproved by other methods, such as those of Sections -i and 5, to the point 
at which this more rapid method becomes applicable. But in some cases (e.g, 
the second example of §8) the method converges even though iV(D) > 1, as 
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may be demonstrated at a later stage at which the norm of the matrix corre¬ 
sponding to D becomes numerically less than unity. 

For the mass of least-square and other problems in which the inverse of a 
matrix is needed, the best proceduie appears to begin with one of the methods 
described by Dwyer [7, 8, 9], carried to a small number of decimal places, and 
then to calculate D from (7.3), a step equivalent to substituting the approximate 
solution obtained into the equations. It may then be evident at a glance that 
the norm of D is so small that the method of the present section will converge 
rapidly to give as many more places as desired. If iV(D) is too large for this, 
and if gross errors have been eliminated, there is a choice between recalculation 
from the beginning, the classical iterative process, and the acceleration of this 
process by matrix-squaring, with perhaps some iteration by small groups. The 
choice will depend partly on how much the elements of D need to be reduced. 
The classical iteration (or sometimes the process of this section) is appropriate 
for correcting a slight excess of N{D ) over unity, its matrix-squaring extension 
for larger alterations. 

Let B 0 be the error in C B , so that Co = A -1 + E 0 , Then by (7.1), 

Ci = (A -1 + K)(l - AE 0 ) = A -1 - EoAE 0 . 

If E\ is the error in C x , so that Ci = A -1 -j- E x , we thus have 

Ei = —EoAEo . 

If A is symmetric, we naturally take C 0 as a symmetric matrix, and this will 
cause E a ,Ci, and E l also to be symmetric. If also A is positive definite, it will 
follow from the last equation that Ei is negative .definite, or negative semi- 
definite. Consequently the diagonal elements of Ci tend to underestimate the 
corresponding elements of A~\ and never exceed them Furthermore, the 
value of a quadratic form whose matrix is A~ l will be at least as great as the 
estimate of it based on C x . The squares, both of the multiple correlation 
coefficient and the generalized Student ratio [15], can be expressed as such 
quadratic forms. Hence both these statistics are slightly underestimated when 
Ci is used in place of the true matrix of coefficients. Later approximations C m 
do not change the signs of these biases, though they make their magnitudes 
approach zero in case the conditions for convergence are satisfied, and definite 

limits converging to zero are easily found for them m such cases from the results 
above. 


a illustrations and furffier comments. We shall indicate symmetric matrices 
by writing only the elements on and above the principal diagonals, 
io illustrate various methods Dwyer [7] has evaluated the inverse of 
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as 


2.0710 

- .1913 

- 7759 

-1.0109 


1.2842 

- .2186 

- .3552 



1.3989 

.2732 




1.(5940 


If the accuracy of the calculation had boon only Mich as to insure enriectnesh 
in the first decimal place the approximation to A~* would have boon 


2.1 


Co = 


calculations, to see that 


D = 1 - AC 0 = 


- .2 

— 

.8 -1.0' 


1.3 

— 

.2 - 

.4 



1.4 .3 

1.7_ 


ic alone, 

without the. um 

of 

a mac 

-.02 

.02 

0 


-.01" 

0 

0 

-.02 


.03 

.01 

-.01 

0 


-.02 

-.02 

.04 

-.02 


0 _ 


or side 


and further that N{D) = V.0052 - .072. This is so much leas than unity 
that the iteration process of §7 will converge rapidly. As a matter of fact, 
without detemining the sum of the squares of the elements of D we could 
have observed at a glance that N(D) must be leas than four times the greatest 
absolute value of an element, and thus have a value leas than .1(1. In the same 
way JV(Co) is seen to be less than 8.4; actually it equals 3.81388. The lattci 
value, with k = .072, p = 4, substituted in (7.5) gives for the norms of the 
successive error matrices E m = C m — A~ l , 


IV(-Eo) < 8.03fc = .578, 

N(E t ) < 8.031c 2 = .0414, 

N(Et) < 8.03 k* = .000216, 

N(E 3 ) < 8.03A; 8 = .000 000 0058. 


This promises merely that after one application of the iterative process the 
results will be accurate to one decimal place, which we know already but might 
not have known for sure in such a case; that a second iteration wilfgive results 
accurate to three places, and that a third will give results accurate to about 
eight places. These estimates will however be improved after actually com¬ 
puting Cl. This may well be done by (7.1) if a machine is available; otherwise, 
and almost as easily, by (7.8) we obtain 


2.070 - .190 - .776 -1.011" 

1.282 - .218 - .355 

1.398 .274 

1.692_ 
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and N(Ci) = 3 8163. (We have now passed beyond the stage of easy mental 
calculation, but might alternatively use the easy upper bound 8.28 for N(C\), 
obtained as before) We shall use this value instead of N(C 0 ) in (7.6) and at 
the same time use for k the value of W(A), whore 

A = 1 - AC t = 1 - 4<7.(1 + D) = 1 - (1 - D)( 1 + D) = if. 

This is most easily found from D, from which it may be written down directly 
by mental calculation: 


Ci = 10-' X 


6 -8 
-8 14 

2 -6 
2 -2 


-2 8 
-6 4 

6 -4 

-8 18 


The norm of A is seen by the crude method to be less than .0072, and is actually 
.003212. Taking the latter value for k we have, similarly to (7.5), 

N{E m ) < NiCitf"" 1 X 2 00323 = 7M5k lm ~' 


Thus, 


W(A) < -0246, 

N(Bt) < .000 0789, 
N(E 3 ) < .000 000 000 8. 


The reduction in these limits of error is due to the difference between [A r (Z))) a = 
.0052 and N(D 2 ) = .003212. 

Using Ci = A + AA we obtain: 


A = 


'2.0710366 


- .1912542 - .7759568 
1.2841486 - .2185780 
1.3989056 


-1.0109294 
- .3551910 
,2732260 
1.6939852 


From this we calculate 


A = 1 - AC» = 10 -8 X 


112 

-164 

64 

48 


-164 
288 
-128 
- 32 


- 40 168" 

-136 88 

100 -104 ’ 
-184 364 


agreeing with the value obtained from the formula A = A , and finally C } = 

C a (l + A) = 


2.071 038 458 - .191 256 831 - .775 956 284 -1.010 928 962" 
1.284153 005 - .218 579 235 - .355 191 257 
1.398 907 104 .273 224 045 

1.693 989 071 


which as shown above is correct to at least eight decimal 
more, in each element. The estimate of A~ l obtained 


places, and doubtless 
by Dwyer by several 
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direct methods to four places is corroborated by this result excepting for a 
slight error in the element in his first row and third column. 

(ii) Suppose that the approximation in the foregoing example had been even 
cruder, with determination of the elements of A" 1 only to the nearest, integer. 
This would give 

"2 0 -1 -f 

10 0 

0 10 ’ 

2 



' .1 

-.4 

.5 

-.2' 

D = 

-.1 

0 

.1 

-.4 

.2 

-.3 

.5 

.1 


0 

-.4 

.4 

-A, 


The sum of the squares of the elements of D is 1.51, so that the norm is greater 
than Unity, and it is not clear at this stage whether the iterative process we have 
been using will converge or not. But upon computing 



'.15 

-.11 

.18 

.27' 

D 2 = 

.01 

.17 

-.16 

.19 

.15 

-.27 

.36 

.09 


_.12 

.04 

0 

.30. 


we find that N(D 2 ) = -\/.6093 = .7800, and since this is less than unity we are 
assured that the process will converge. We may write immediately, without 
use of a machine or written side calculation: 


2,0 


Ci = Co + C*D = 


- .1 
1,0 


- .9 - 1,1 

.1 - .4 

1.0 .3 

1.4_ 


Utilizing the value of if already determined, we readily find 


2.032 


C, m Cl + = 


- .138 - .848 -1.056" 
1.138 -0.42 - .372 

1.182 .274 

1.558. 


From this point on a machine is needed for efficiency. The next step is to 
calculate D , either by squaring if or by the formula if = 1 - AC t ; both 
methods may be used as a check. The result is: 



808 

- 730 

1094 

1330" 

10~ 4 X 

20 

786 

- 830 

890 

846 

-1560 

1998 

540 


.616 

80 

152 

1696. 
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We may now consider the accuracy of further approximations, inserting in 
(7.5) JV(C's) = >/ 13.385572 = 3,659 in place of N(C 0 ), m - 2 for m, and A ~ 
W(D 4 ) = .4119. Thus 

N(Ei) < (9.8807)(.4119) = 4.0699 
N(E,) < (9.8807) (.4119) J = 1.6764 
N(E<) < (9.8807) (.4119) 4 = .2844 
N{E t ) < (9.8S07)(.4119) 8 = .00819 
N(E t ) < (9.8807)(.4119) ls = .000 006 79. 


Because of the roughness of the initial approximation in this case the con¬ 
vergence is rather slow at first, but later it is much accelerated. So far as the 
limits found above show, five iterations are necessary to be sure of even approxi¬ 
mate two-place accuracy in the results (somewhat better limits could be ob¬ 
tained after actually calculating Ci , still better ones from C 3 , cf.c.), but the 
sixth iteration gives results sure to be accurate nearly to five places. Perhaps 
the best treatment of a numerical case of this kind is to work out the solution 
by Dwyer’s method to two, three or four places, and then to apply the iterative 
process once, and as many more times as necessary to obtain the required 
accuracy. 

The final step should, for the sake of checking, be a calculation of (7 m+1 from 
C m (2 — AC„), rather than from C m + C m D ■*". 

Upon observing that N(D ) > 1 we might have used the Seidel process to 
improve each row of C 0 . This process is however extremely slow, and in the 
present example is markedly inferior to that used above. 

(iii) If we start from the result which Dwyer gives to four decimal places as 
C 0 , we obtain 


D = 1 - AC 0 = 10" 6 X 


1 

4 

-3 

-2 

3 

-2 

1 

0 

3 

3 

-1 

1 

0 

2 

0 

-2. 


We find N(C 0 ) = 3.8188, and putting k = N(D) = .00085 we have from (7.5), 
N(E m ) < 3.8188 (.00085) 2 "(2.00085) < (7.6408)(.00085) 2 "\ 

Thus N(E0 < .000 0055, 

N{E t ) < .000 000 000 0004. 


9. Certain other methods of successive approximation. A class of methods 
for solving linear equations, which may be extended to find the inverse of a 
matm is given^by Frazer, Duncan and Collar [12, pp. 132-133), generalizing 
a method of J. Morns. In this method the matrix A of the coefficients in the 
linear equations, or the matrix to be inverted, is written as the sum of an easily 
inverted matrix V, for example a diagonal or triangular matrix, and another 
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matrix W. Then 

a ~ 1 = (i + 'r'wy'v - 1 = (1 - fr l v~\ 

where / = - V~ l W. If the latent roots of / are all less than unity in absolute 
value, and a fortiori if N{f) < 1, the series 

1+ / + f + f + • • ■ = (1 “ f)~ l 

converges. To solve the equations AX = 6, where X and G are column vec¬ 
tors (i.e. matrices of one column) is to determine 

X = A~ l G = (1 - f)~% 

where H = V~ 1 G. The method of Frazer, Duncan and Collar is to calculate 
the successive vectors 

X 0 = H, Xr = H+fXo, X t = H +fX 1 ,"‘,X r *° H+fXr-i, •••. 

It is clear that 

X = (1 +f + f+... +f)H. 

The error in X r is therefore the vector 

E r = f + \ 1 - f)~ l H. 

We may ascertain a limit for the errors if N(f) < k < 1. Indeed, by (6.3) 
and (6.10), 

N{Er) < k r+1 (p Ui - 1 + JL) N(H), 

where p is the number of unknowns; and no individual unknown will have an 
error greater than N(E r ). 

Convergence of this method, if existent, may be accelerated by matrix¬ 
squaring. Indeed, upon calculating in turn f, f, f, ■ ■ ■ by repeated squar¬ 
ings, we need only to work with the sequence 

X 0 = H, X, = (1 + /)X„, X, = (1 +f i )X 1 , 

X 7 = (1+/)X,, X„-(l +f*)X 1t >-- t 

omitting the intermediate approximations. This will be worth while for solving 
a single set of equations only in case such great accuracy is required as to de¬ 
mand the use of rather high powers of f. Each squaring of / consists of the 
formation of p sums of products, so that determination of, say, Xu by this 
method requires 4p 2 such sums after / has been determined, in addition to the 
5p involved in finding Xi, X 3 , X 7 ,,X 1S , X u after the squarings. By the 
method of Frazer, Duncan and Collar the corresponding number of sums of 
products would be 31p. Since 4p ! + Bp < 31p only in case p < 6, it nppears 
that the matrix-squaring is justified only for six or fewer unknowns unless a 
larger number of terms is required. Furthermore, increasingly high powers of 
a matrix, to be useful, need usually to be expressed with more and more signifi¬ 
cant digits. 
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If more than one system of equations with the same matrix A is to be solved, 
these methods have the advantage that the same matrix / can be used for all 
the vectors G of right-hand members. In such cases the value of matrix-squar¬ 
ing is enhanced in comparison with that in which only a single system of equa¬ 
tions is to be solved. Determination of A~ l is equivalent to solving p such 
systems in which the several column vectors G together constitute the identical 
matrix. If more than p of these systems of equations arc to he solved it is best 
to find A~ l and then form the various solutions A~ l G from the columns G of 
right-hand members. 

It is worth noticing that the matrices 1 + /, 1 -f- f, etc., are commutative, as 
are all rational functions of a single matrix. In difficult cases this may occa¬ 
sionally provide a useful check. 

This method differs from the other iterative methods with which we arc 
concerned in that errors of calculation are not automatically corrected by it. 
This is a serious disadvantage, especially for the inexperienced computer, and 
makes desirable the careful maintenance of a check column. On the other hand, 
it does not require any preliminary knowledge of the solution. Indeed, it should 
be classified rather with the direct than with the iterative procedures on this 
account. 

The critical element in determining the success of this method is the possi¬ 
bility or impossibility of finding suitable matrices V and W , such that V ' 1 can 
be calculated easily, and such that the elements of / — ~V~ 1 W are sufficiently 
small to make the roots all numerically less than unity. Morris uses for V the 
matrix derived from A by replacing all the elements above the principal diagonal 
by zeros. This insures that the corresponding positions in V~* arc also occupied 
by zeros. The other elements of V~ l are then determined fairly easily. If the 
non-diagonal elements of A, which appear in W, are sufficiently small, this fact 
will insure small enough elements in/to make convergence rapid. 

A second method, given by Frazer, Duncan and Collar, chooses for V a 
diagonal matrix (one having only zero elements except in the principal diagonal), 
or simply the unit matrix. This choice reduces' the labor of inversion to a 
minimum. Successful convergence will take place when the non-diagonal 
elements of A are sufficiently small in comparison with those in the diagonal, 
if F is taken as the diagonal matrix containing the diagonal elements of A. 

A third method which may be useful in certain cases, particularly when some 
but not all of the unknowns are required, is the following. Let A be partitioned ; 



where a and d are square submatrices which, being of lower order than A, are 
more easily inverted. Let V and IT be the correspondingly partitioned matrices 
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Putting s = a% t = d~ l c, we have: 


" o ! s “ 


st [ 0 

, / = 

»*! 0 ■ 

— 

. /' = 

— 

—- 

_ i ! 0_ 


_ 0 ! is _ 


L o >)\J 


If only the first q of the p unknowns are required, a and b may he taken as 
matrices of q rows If (h and Ih consist respectively of the first q rows of <f 
and H , and if <? 2 and Ih consist of the remaining rows, then Ih - a % and 
H 2 = d~ l Gi Then, in case of convergence, tlie first q rows of the solution 
are given by 

(1 + st)[ 1 + (s0 2 ][l + (*0*1 ■ • ■ - s( 1 4- ta)U + (tt)*Kl + (te) 4 ] ■■■ Ih. 

Convergence to the correct values is assured hero if the norm of any power of 
st is less than unity, as is true if and only if the absolute values of all the roofs of 
st are less than unity. This is easily seen to be true, since as m increases 

lim (is)"' = f[hm (st) m ~ l }s, 

10. A simple iterative method of solving equations. An entirely different 
method, whose eonveigenee is independent of the initial trial values, is the 
following. To solve for the column vector X the equation AX G, we may 
start with an arbitrary column of trial values .Y® and a scalar constant h, and 
then for m — 1, 2, • ■ ■ calculate X„ from 

X m = hG + (1 - llA)X n -i . 

If X m is equal to I„-i it is obviously the desired solution. Otherwise there 
is an error 

X, n - X = (hG - X) + (1 - /d)A' m - 1 = (I,A - 1)X + (1 - hA)X n i 

= (1 - /iA)(X m _i - X) = ••• = (1 - hA) w (X Q - X). 

This conveiges to zero as m increases provided the latent roots of 1 — hA are 
all less than unity in absolute value If A has only real roots this is equivalent 
to requiring that they all be between 0 and 2/h. In particular, if .-l is a correla¬ 
tion matrix, its roots are all real and positive. Since their sum = lr(A) p, 
where p is the number of rows, all roots of A lie between 0 and p. Conse¬ 
quently the process will converge in this case if 0 < It < 2/p. It is desirable, in 
order to make the error diminish us fast us possible, to take It ns large as is con¬ 
sistent with convergence. In some cases a lower limit than p will lie known for 
the greatest root of A, and then a smaller value than 2/p can lie taken for h, 
A limit of error is obviously set by 

N(X m - X) < [.V(l - A/l)] M A r (A r 0 - X). 

This method can of course be applied to find the inverse matrix. 

It can also be accelerated by matrix-squaring. If we put D — 1 — hA we 
have for example, 
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X a = (1 + D)( 1 + D 2 )(l + DyiG + D S X 0 . 

The last term will approach zero in case of convergence, and may be omitted 
in this type of calculation. 

Thus accelerated, the method gives decimal places of accuracy increasing in 
geometric instead of arithmetic progression, and is remarkably simple and 
straightforward. It is at its best when the roots of A are known to bo closely 
clustered about unity. A criterion of this is that 2(A,- — A) 2 shall have a small 
value, where A is the mean of the p roots A,. This sum of squares equals 
2A? — p for a correlation matrix A, and 2A! = ir(A 2 ) = 22a] j - p + 2 , 

' o 

so that 

£ (A, - A) 2 = 2 Za„. 

t<2 

Smallness of this quantity is favorable not only to this iterative method but 
also to those of §§4 and 5. 

11, TJse of the characteristic equation for inversion and for finding deter¬ 
minants. A method differing greatly from the others is based on the C’ayley- 
Hamilton theorem that every matrix satisfies its own characteristic equation 
[29, p. 23; 4, p. 296], This is the equation 

flu — A ayj • • dip 
an 022 — A • • ■ dtp 


a pi ®ji2 • • • dpp — A 
= e v ~ v-iA + e^-jA 2 - • • • + (-) P “V I A^ 1 + (-)V = 0, 

where e, (r = 1, 2, • • • , p) is the sum of the products r at a time of the roots, 
and also equals the sum of the r-rowed principal minors of the matrix A. Sub¬ 
stituting A for A, which by the Cayley-Hamilton theorem is legitimate, multi¬ 
plying by A , and transposing a term, yields 

(11.1) epA~ l = ep-i - e,_ 2 A + e p _^ - . - -f- (~y ei A p ~ 2 + (-) I ’ +l A p “ 1 . 

This equation provides a direct method of calculating A~ l as soon as the ele¬ 
mentary symmetric functions e r of the roots of/(A) = 0 have been evaluated. 
Ibis evaluation may be accomplished by means of Newton’s identities [4 p. 243] 
connecting the elementary symmetric functions with the power-sums If s 
is the sum of the rth powers of the roots, these formulae give; 

ei = Si 

e 2 = ?(«i Si - sj) 
e a = i(e 2 Si ~ eis 2 + Si) 

Sp = P ( Bp - lSl ~~ e P-2Si + ■ • ■ ;fc Sp). 
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The procedure is to calculate in turn A 2 , A 3 , ■ • • , A v l , then to obtain the s's 
from the diagonals of these matrices, since s, = lr{A T ), then to obtain the ele¬ 
mentary symmetric functions ei, • , <V i of the roots from Newton s fonuulae, 

and to substitute these in the right-hand member of (11.1). It is then only 
necessary to find and divide by c p , which equals the determinant of A. hoi 
this, and for checking the calculations, there is a choice of methods. We may 
find’the diagonal of A p , without troubling to compute tin- whole of this matrix, 
from the product AA v ~ l and also, to provide a comprehensive check, from 
A”- 1 A or possibly from the product of two powers of A of exponents approxi¬ 
mating p/2. The sum of these diagonal elements of A” is s p , which may be 
substituted in the last of the Newton formulae above with the quantities pre¬ 
viously found to give e p . An alternative method is to multiply A by its adjoint 
e p A _1 , which is computed by (11.1), to obtain the determinant c p . 

The total number of multiplications, divisions, and additions is distinctly 
greater by this method than b}' efficient direct methods such as that of Dwyer 
[7, 9]. On the other hand, this method is straightforward and easily checked; 
the divisions involved are of the simplest character, consisting only of the 
divisions by 2, 3, ■ • , p in Newton’s formulae and of the final division of the 
adjoint matrix by one number; and for huge matrices it is ideally adapted for 
matrix multiplication by means of punched cards A further very important, 
advantage of this characteristic function method is that it‘yields considerable 
additional information as a by-product. Not only the determinant of the 
matrix but the sums s r of the principal minors of each order r are determined. 
Moreover the characteristic equation, whose coefficients would he exceedingly 
difficult to compute directly from definitions for a large matrix, is by this method 
made available for the study of the latent roots, which lmv<* great interest in 
themselves for numerous purposes. 

The characteristic function method is applicable whether A is symmetric or 
not. If it is symmetric, the same is true of each of the other matrices appearing 
in the calculation, so that it is necessary to write only about half the dements, 

An lllustiation using a symmetric matrix has been given by M. D. Bingham 
[3]. In the illustration below the matrix is not symmetric "and has complex 
double roots and non-linear elementary divisors, so that evaluation of the roots 
by iterative methods, though possible, would be very slow and laborious, as 
shown by Aitken [2] This is indeed the same, example used by Altken in this 
discussion. But it should bo noted that the associated latent vectors, which 
are determined along with the roots in the iterative processes, require the, 
solution of sets of p — 1 linear equations if the roots are found directly by solving 
the characteristic equation. 


15 11 (1 

1 3 9 

7 6 0 

7 7 5 

17 12 5 


- 9 —15 ~ 

-3 - 8 

- 3 -11 

- 3 -11 
-10 -16 


Let A = 
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Then 

"_40 - 9 105 - 9 —40"] r — 017 -380 (it 199 25G 

-76 —43 32 44 23 —200 -189 -310 355 280 

A 1 = -55 -22 02 20 -10 , A 3 = -443 -?79 -100 115 259 , 

-61 -25 65 20 - 7 -404 -300 -130 139 292 

_-40 - 9 110 -14 -40 J L —017 -385 09 199 250_ 

'-1342 -978 -29G3 2444 2000 " 

944 522 -1982 - 10 503 

A 4 = (A l f = - 358 -333 -2435 1307 1331 -= ,1.1 3 (cheek). 

- 175 -243 -2645 1217 1355 

-1312 -903 -2978 2444 1991 _ 

From the diagonals of these matrices, 

Si =5, s 2 = —41, Sa — —217, .s'< — —17. 

Calculating the sum of the diagonal elements only of A' J (on a machine, without 
listing them separately) from AA 4 and also, as a check, from .l\l s we find s 6 — 
3185. Newton’s formulae then give 

Ci -5, <33 = 33, ei - 51, c 4 = 135, r 6 = -225, 

the last value being that of the determinant of .1. We readily find from (11.1), 

'-207 04 -124 111 171" 

-315 30 195 -180 279 

A~ l = _ A -315 30 - 30 45 270 . 

225 -225 75 - 75 0 225 

,-414 53 52 - 3 342, 

So far, all results by this method are exact, but the division by 225 introduces 
recurring decimals and therefore a limited validity for the form 


9200 

- .2844 

5511 

- 4933 

- .7000 

1 4000 

-.1333 

- .8667 

8000 

-1.2000 

1.4000 

-.1333 

.1333 

- .2000 

-1.2000 

1.0000 

- .3333 

3333 

0 

-1.0000 

1 8400 

- .2356 

-.2311 

.0133 

-1.5200 


The characteristic equation 

/(X) = X 5 - 5X 4 + 33X 3 - 51X S + 135X + 225 = 0 
may in this case be solved readily, since 

/(X) = (X + 1)(X 2 - 3X + 15) 2 
III. Latent Roots and Vectors 

12. Direct and iterative methods. If the latent roots but not the latent 
vectors of a matrix are desired, as for example in a preliminary study of vibra- 
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tions in machinery being designed, whore the important question is whether uny 
root has a positive real part, it is only necessary to find the eharaetei istie equation 
and to work with it by the methods of the theory of equations. The encHirieiif s 
in the characteristic equation are the sums of the r-rowod piineip.d minors 
(r = 1, 2, ■ ■ • , p), and are expeditiously found diieetly from this definition fur 
matrices of four or fewer rows. For large matrices, however, the calculation of 
so many large overlapping determinants is wasteful of effort, sin. e many vir¬ 
tually equivalent calculations must be done repeatedly. Indeed, rnlriilufion by 
determinants in a great many situations, including the solution of linear equa¬ 
tions, is open to this objection. The methods of §11 yield tin* ehnnieteiisfie 
function m a manner which, for large matrices, appears to be the be-? available, 
excepting perhaps the new method of Samuelson [25a]. 

When, as is commonly the case, the latent vectors are desired, a straight* 
forward calculation diieetly from the definitions would require not only setting 
up and solving the characteristic equation, but also the solution, in the ease of 
each root, of the set of linear equations in p unknowns whose matrix is obtained 
from the characteristic matrix by substituting the particular root for A. If is 
this solution of linear equations that aggravates greatly the computational 
labor when direct methods are used. 

An ingenious method has been used by R. A. Fisher [11, pp. 2t)H IT.]. Starting 
with a four-rowed determinant whose elements are linear functions of an un¬ 
known e, Fisher calculates the value, of the, determinant for selected values of S, 
and then by interpolation using divided differences finds the largest value of 0 
making the determinant zero, The point of the divided difference method 
is that it avoids the direct calculation of the determinant for more than a few 
values of 0, replacing it essentially by calculation of the fourth-degree poly¬ 
nomial in 6 from its differences and using the fact that the fourth divided dif¬ 
ferences are constant. The linear equations are then solved in a direct manner. 
If applied to large matrices this would be very laborious, but it compares fa¬ 
vorably with calculation directly from definitions in the manner suggesUai by 
reading books on algebra and solid analytic, geometry, But even - with large 
matrices Fisher’s method may perhaps be the best in certain cases, e.g. if all 
that is desired is the root of median absolute value and if this root is real, or if 
it is desired to find a few real roots that are close together, with numerous others 
greater and another numerous group less than these. This is because the itera¬ 
tive methods give the real roots in the order of their absolute values, beginning 
with the greatest, but with the possibility of obtaining them in the opposite 
order by first inverting the matrix. The Malloek electrical device [22] may be 
used to calculate determinants, and thus to apply this method. 

If A and B are p-roived matrices and B is non-singular, the determinantal 
equation | A - \B | = 0 is equivalent to | AW" 1 ~ X j * 0 and also 
t0 J f I ~ 0 The column vectors X, satisfying (.4 - \B)X - 0 also 

satisfy [BA — X)X = 0 and the row vectors V, satisfying 7, {A - X.J5) * 0 
also satisfy V,(AB - X,-) = 0. If A and B are symmetric, V< = X\ . Thus 
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any problem of this type is reducible to that of finding latent roots and vectors, 
upon calculating B 1 by any method and multiplying in either order by .4. 

The fundamental iterative method for finding latent roots and vectors of A 
begins with an arbitrary matrix X 0 of a single column. This column Vector 
is premultiplied by A to obtain a new column vectoi -4 1 • If, as is possible 
though unlikely, the elements of Xi are proportional to those of X 0 , they con¬ 
stitute one of the latent vectors of A, and the factor of proportionality is the 
corresponding root, for then X 0 and Xi are solutions of the matrix equation 
(A _ X)X = 0. It should be observed that the latent vector is determined only 
to within an arbitrary scalar factor of proportionality, though we may some¬ 
times find it convenient,to normalize the vector by choosing the factor in such 
a way that the sum of the squares of the elements, which equals the square of 
the norm, is unity. 

If Xi is not proportional to X 0 , the operation may be repeated by calculating 
Xi = AXi, then X 3 = AXi , and so on. If these vectors are then normalized, 
or if they are divided by, say, their respective first elements, then the other 
elements will (in the cases of greatest practical importance) gradually approach 
stable values which will determine one of the latent vectors, while the suc¬ 
cessive factors of proportionality will approach the corresponding root. The 
convergence of this process is however apt to be rather slow. Fortunately 
there are several known ways of accelerating it. 

Matrix-squaring is the first of these methods of accelerating convergence 
[17, 19]. It is clear that X, = A‘X 0 . Consequently one application of the 
iterative process with A 1 is equivalent to t iterations with A . It is relatively 
easy to square A, and then by repeated squarings to form A\ A h , A l \ etc. The 
economical limit of this process is determined partly by the necessity of re¬ 
taining more and more digits in the successively higher powers, but up to a 
point not yet determined exactly it presents very great advantages. For pro¬ 
ceeding to the determination of latent roots of other than the maximum absolute 
value, with their associated vectors, this method lends itself to further short¬ 
cuts [17, 2], which seem to give it an advantage over an older method [13]. 

Another method of accelerating convergence, introduced by A. C. Aitken, 
and referred to by him as the 5 2 -method, uses the ratio of an element of 
X, + i to the corresponding element of X, in the function 

4>(i + 1)0(t - 1) - [<t>(t)f 
<*>(« + 1) - 2<f>(t) + <t>(t - 1) ’ 

which converges rapidly toward the root A, of greatest absolute value. If a 
constant c is subtracted fiom all three of the quantities 4>{t -f and <p(t - 1) 

before computing the foregoing function the result is unchanged. This fact 
reduces greatly the computational labor, since the decimal places of Xi already 
determined are common to all three. 

If A is symmetric and we form the scalar products of X ( = A 'X 0 with itself 
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and with X i+ i we have 

X\X t = XoA u X o 




.yU i,+1 y« 


The ratio of these two scalars gives an estimate of Xj which on the buds of the 
ratios of consecutive elements in a given place in the trial vectors would not hr* 
reached until a later stage of convergence, corresponding m find to twice as 
many iterations. Aitken has pointed out the great value of this procedure for 
finding the root (but not the latent vector), and has extended the idea to u\\ in ■ 
metric matrices, where there is a complication because of the existence of two 
latent vectors for each root, one determined by premultiplying by .1, the otlxjj* 
by A'. 

The comprehensive paper [2] of Aitken gives an extremely valuable account 
of the whole problem and processes of finding the latent roots and vectors, in¬ 
cluding a survey of the various cases arising when there aie multiple roots, 
complex roots, and non-linear elementary divisors. This paper should be studied 
carefully by anyone with any substantial numerical problem of this kind. 

A method using rotations of two variables at a tune lias been devised bv '1'. L. 
Kelley [21], 

The remainder of this paper will be concerned with some results, believed 
to be new, by which useful upper limits can he sot for the errors of the results 
yielded by iteration for latent roots and vectors of a symmetric matrix. To 
find such limits of error for asymmetric matrices appears to be a much more 
difficult and as yet unsolved problem. 


13. Accuracy of iteration with symmetric matrices. If A is symmetric, as 
it is in most statistical problems (though with some exceptions, as in [18]), the 
roots are all real and the elementary divisors are linear. Moreover there exist 
an orthogonal matrix H and a diagonal matrix 

" Xi 0 0 

. 0 ** 0 • ■ ■ 

0 0 X» ••• 

such that 

(13.1) A = HAH'. 

Since H is orthogonal, HH' - 1, and therefore 

(13.2) A = H'AII, A 1 = 11X11'. 

We may associate with the successive, trial vectors X, = AX,_i « A 1 Xu the 
vectors F, = H'X t ; then X t = HY t . From these equations and the second 
of (13.2) it is clear that 

Y, = H'Xt = H'A‘X 0 = A'H'Xo = A'F 0 . 
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Hence, if the elements of Fa are yi, • ■ • , y P , 

~Vi\ i‘ ' 
1/2X2 


Y, = 




Now let be the scalar product of X t and X : 
(13.3) = X'tXt- h = FiFt-* = S2/X‘~ h ; 

4 

and let 


(13.4) 


vu = — = 


at, 

ao; 




If A has a negative root this fact will become evident after a certain stage in 
the iteration used to obtain this root by an alternation of sign of the numbers 
in any one position in consecutive trial vectors. However A 1 , which as pointed 
out in §12 may well be calculated anyhow, has only positive roots, which are 
the squares of the roots of A, and has the same latent vectors as A. Hence we 
shall have results of sufficient generality for real symmetric rnatrices if we as¬ 
sume that all roots of the matrix with which we work are positive or zero, i.e. 
that it is positive definite or semi-definite. Let us choose the notation so that 


Xi > X a > ■ • • > X P > 0. 

Then if k > 0, 

a a , = 2yX‘ < \WX.‘~ k = X?«*,. 

Hence, by (13.4), 

(13-5) Xi > [v*,]- 1 '*. 

It is known [23] that if cq , * • • , a P , c,, • • • , c p are any positive numbers, the 
function 

/ cio* + • • • + c P aW ,k 
V Cl + • • ■ -f c p ) 

increases monotonically with k. Putting c, = V X‘, a, = X~* if X, ^ 0, and 
c. — a, = 0 if X, = 0, we find that the right-hand member of (13.5) decreases 
monotonically as k increases. Hence the best of these lower bounds for Xi is 
that corresponding to the least value of k that can bo used, namely k * 1. 
Consequently the lower bound to be recommended for Xi is given by 


provided 2 0 ^ ^ tMs l ° Wer b ° Und fl PP r6aches Xi when t increases, 
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An upper bound for Xj is available from the faet that the Mini of the /Hi power- 
of the roots is the trace of A 1 . Since we assume all X, > 0 this gives 

Xi < (tr A') 1 ". 

That this upper limit converges to \i when t increases is easily seen from hi.7) 
upon consideration of log (2X, , ) I, ‘. 

A lower limit alternative to that of (13.0) is also available from tr (.1 '), and 
likewise converges to Xi. Indeed, since Xi is the greatest root, we have 

Xi > (trA'/p)'". 

We now seek limits of accuracy for the latent vector corresponding to Xj ami 
estimated by X t . If we call this vector X, and define 1' H ’A', 
then lim Y* = F*, where F* is the normalized form of F. F* has as its fth 
element 



If ?/i ^ 0, and Xi > X; > X 3 , this limit is ±1 if £ ~ 1, and is otherwise 0. We 
take the value of F to be 



in this case. If \i is a multiple root, the limit of X* will depend on the initial 
values 2 /i. 


A useful measure of the closeness of approach of Xt to X is the ‘‘correlation 

coefficient” r, = X*'X* « Y*'Y* = - , 

V 2y,-X 5 / 

which obviously approaches unity as t increases if y t 0 and Xi is a simple 
root, or if Xi = Xy = • • • = X, > X 1+l and wc arrange our definitions so that 


Vi 9^ 0 and = 

r _ 2/lX 1 rp 


= ■■■ - y, = 0. In terms of the notation previously int roduced, 
The sum of the squares of the differences of corresponding elements 


of the normalized vectors X* and X t , i.e, [A^AT* - X?)f, is 2(1 - r<), and 
therefore approaches zero as r, approaches unity. We shall seek for r, a lower 
limit approaching unity as t increases. 


Let us now put 
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For k > 1, 




5 ,51-* s. -)- 


> xro/ixr + + ylK‘) 

= >.T k act(Wii + ■•• + Wpt) = xr*« 4 i(l T Wu) ~ xr fe «0i(l - r <) • 

Xl’ ail 


rr > 1 - 


= 1 - V«XI. 


«0( 


This unfortunately is not a very useful lower bound for r,, since it approaches 
zero, not unity, as i increases. 

A more satisfactory result is obtained as follows. Lot ij, = X, . Then 
Vkt = . For any value of 1 we may consider a distribution of a variate 

taking the positive values iji, • • • , Vt with the positive weights, or probabilities, 
w tt . The fcth moment of this distribution about 0 is vu In particular the 
first moment is vn , and is evidently at le ast equa l to ni , which is the least of the 
t u . The standard deviation is a- = VVj, — v{t • -A- 8 f increases, vu will ap¬ 

proach vi and a will approach zero. Hence, if Xi > X 2 , a stage will eventually 
be reached at which v u < ij 2 . Let 


k = 



By the Tchebychef-BienaymS inequality, 

Wit + • • • + w p t < , 


and therefore 


r] = w ii > 1 - 


vu — v l t 

(l?5 — I’ll) 1 ’ 


provided l is large enough so that vu < ij 2 . This lower bound approaches 
unity, as desired, when t increases. 

If Xi = X*, ■ • • = X fc > X*+i, the same proof shows that 


u>! (+> + ■ ■ • + W +) > 1 - 


C51 —• Kit 
(ifr+I - Vlt) 2 ' 


provided vi, < ij r+ i. The left member is the correlation of X, with that one 
of the k-parameter family of latent vectors corresponding to the multiple root 
for which the correlation is a maximum. 

In order to utilize these results we need a lower bound for vi, or for tp+i • 
In case X a ^ h this requires an upper bound for X 2 . Such an upper bound may 
be found at the next stage through working with the reduced or “deflated" 
matrix used m [17] This is Ai — A — \XX', where X is the normalized latent 
vector corresponding to X L ; and X] < tr (A{). 

Since we have arrived at a definite lower limit for rt which approaches unity 
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as the iterative process proceeds, and since we have found for X, upper and 
lower bounds converging to it, a solution has been found for the troublesome 
problem of the degree of accuracy in stopping at any stage of the iteration for 
finding the greatest root, and the associated latent vector. It would be possible 
to go on to find from these results appropriate inequalities for /U , and then by 
repetition of the above arguments, for X 2 and the second latent vector; and then 
likewise for the second reduced matrix A 2 and the further roots, vectors, and 
reduced matrices in this cyclic order. These steps may well bo taken by the 
computer who has mastered the above argument in connection with a numerical 
example, 
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ON SOLUTIONS OF THE BEHRENS-FISHER PROBLEM, 
BASED ON THE /-DISTRIBUTION 


By Henry 8cheff£ 

Princeton University 

1. The Problem. The problem (1, 2] is the interval estimation 1 of the dif¬ 
ference’ of the means of two normal populations when the ratio of the variances 
of the populations is unknown. The render who wishes to see. the present 
solution before considering theoretical details will find it recapitulated in the 
Summary at the end and will want to refer to the following notation; 
(xi, j s , • • • , and (j/i, yi , ■ • • , y n ) art; random samples from normal popu¬ 
lations with means a and 0, and variances p and v, respectively. Define 
S - a — 0. We assume m g n, and that the variates in each sample are in 
the order of observation, or else have been randomized. 

Recently Neyman [3] has called attention to a solution which we shall desig¬ 
nate as (B), and which is a special case of an unpublished solution of Bartlett, 2 . 
It will bo simpler to describe (B) later, but we mention now that it has the 
following advantages: (i) its validity does not depend on the values of unknown 
parameters, (ii) the required computations are simple, and [hi) only existing 
tables are needed,- the widely available Fisher /-tables. An unsatisfactory 
aspect of (R) is that when the sample sizes are unequal, n — in of the variates 
y< are completely discarded. The solution below shares with (B) the. advan¬ 
tages (i), («), (in); indeed, it is identical with (B) when n — m, but when 
n m it is free from the above objection. 

2. Simple Solution. We begin with a simple restricted approach; later we 
will review the result from a somewhat broader standpoint If random variables 
di, cli , • ■ ■ , d m are independently normally distributed with mean S and vari¬ 
ance cr 2 , and if L and Q are, defined from 

L - t, djm, Q = E (d. - L)\ 

then m}(L — 5)/a and Q/a arc independently distributed; the former is a 
normal variable with zero mean and unit variance; the latter, Q/o - xl. - i, 
where xl in a generic notation for a random variable distributed according to 
the x 2 -law with k degrees of freedom. The quotient 

1 We treat tlio problem from the standpoint ot confidence intervals, rather than signifi¬ 
cance tests, since when the former arc available for S so is a whole class of the latter, namely 
for any hypothesis S = do , lor all i D , Furthermore, questions of the existence of ''best” 
tCBtB and “best" confidence intervals are closely related [5a], 

2 How far Bartlett followed the path of this paper is not clear from the brief mention 
of his results by Welch [4], except that he did establish the sufficiency of certain ortho¬ 
gonality conditions. 
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m\L - &)/[Q/(m - 1)]' = /„_, , 

where h denotes generically a variable having the /-distribution with k degrees 
of freedom, Define Ik., from 

(1) Pr (-/*,« £ h g h„) = t. 

Then a set of confidence intervals for 5 with confidence coefficient « is 

(2) |8 - L| S ^..{Q/[w(m - 1)])'. 

Denote by E(l) the expected length of the confidence interval (2), 

E(l) = 2l m ^,.[m(m - l)]~ , <r£’[(Q/<r s ) 1 ], 

( 3 ) E(l) = l m -i,iCm-\<r/m i , 
where 

Ck = 2 r'Efxk) = (8/fc) ! r(p + l)/rm. 

The symmetrical choice ( 1 ) of the limits on the /-distribution minimizes ( 3 ). 

We consider using in connection with the confidence intervals (2) linear 
functions 


(4) 


n 

d, .Xj /T c,,' y } , 


7-1 


f ~ 1 , 2 , • • • , m, 


The variables d t have a multivariate normal distribution. Necessary and 
sufficient conditions that the d, all have the same mean 5 , equal variances o- s 

and zeio covariances, are easily found to be 


(5) 


" n 

^ CiK C)k = C j 


7-1 


here d { , - 1, = 0 if i * j. If ( 4 ) are used in (2), E(l) is given by (3) with 

* “/ + c /- H f ce t0 min imize E(l) we must find an m X n matrix (' = (c„) 
satisfying (5), and for which c is minimum. The minimum value of C 2 is m/rr 
this is easily proved by the use of vector algebia 
Let 7 , be the fth row of C, and let ^ be the 1 X n matrix ( 1,1 - -. n De¬ 
note the transpose of a matiix by a prime. Then the conditions (5) read ‘ 

(6) 


= l, 


ya'i = ch„ . 


n-ZZZ 7 : T - ft ®- »•** p—*. 

second group of coud,Irons (0) 8in« this srt btsisTf 
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where the g* are scalars. Now 

71 

1 = 7 ; ft - 52 foy.yt - a,- C ! , i- 1, 2, • - •, m, 

and thus gi = — ■ • ■ = g m - c~ 3 . But 

n - w = 52 glyky'k = me" 1 + 52 glc £ mc~\ 

and hence £ j£ m/n. On the other hand this lower bound for £ may be at¬ 
tained by taking any set ft , ft, • - - , ft of orthogonal vectors with norms 
m/?z, that is, ftdj- = 5 tJ m/n, and rotating them so that their equal angles vector 
X = (n/m)(ft + ft + ■ • • + ft,) coincides with ft Then XS = ft where <S\5" = 1. 
For 7i = P>S, 

T ,ft = ftM' = ftA' = 1 , 

7>7; = fixSH'p', = ft/9; = ftm/u, 

so that equations (G) are satisfied with c 2 = m/a. 

An especially neat solution of this minimum problem was obtained by the 
above method; its validity may easily be verified directly. It is 

(ft (m/a ) 1 - (mc) H + 1 /n, j £ m, 

Cxi = 

[1/rt, 3 > m- 

Then 

m n 

d, = x { - (m/n^y, + (mn )" 1 52 ft - 52 ft/ft 

7-1 7-1 

and L and Q become simply L = j — y and 

(7) <2 = 52 (ft ~ «)*, 

1-1 

where 

m n m 

( 8 ) x = 52 ft'/m, y = 52 ft'/ft ft =*= x, — (m/n^y, , u = 52 w,/m. 

>-i i-i i-i 

We may now write (2) as 3 

(9) £~y~ - 1)]| ! g 5 g £ - ij + t m „ u ,[Q/[ni(tn - 1)]| ! . 

The solution (B) mentioned at the beginning, consists of taking <\, — 5,; in 
(4), so that the conditions (5) are satisfied with £ = 1 . Hence for both (B) 
and (9) the expected length of the confidence interval is given by (3), but with 
ft = n + v for (B), while £ = g -f- ( m/ri)v for (9). 


1 Obvious modifications of (9) will make it suitable for “one-sided” estimation, 
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3. More General Solutions. We now generalize our approach to the following 
extent: Let i be a linear form and Q a quadratic form in the variates 
2 / 1 , • ■ ■ , y n , with coefficients independent of the parameters 
(i of p.) If for some constant h i. of p , and some function /of the paranieteis, 
h{L — 8)/f and Q/f are independently distributed, the former according to the 
normal law with zero mean and unit variance, the latter according to the x 2 -law 
"with k — 1 degrees of freedom, then the quotient 

(10) h(L - 8)/[Q/(k - 1)]‘ 

will have the /-distribution with k — 1 degrees of freedom, no matter what the 
values of the parameters. 

We note that necessarily then 

(11) E(L) = 5, 

(12) f = h*E[(L - «)*]. 


The /-distribution of (10) leads to the confidence intervals 

(13) | * - L | S WQ/(k ~ 1)1'/*, 

where /*_!,, is defined by (1), and the confidence coefficient is «. Proceeding as 
toward (3), we find that the expected length of (13) is 

( 14 ) E(l) — lk-i,'Ci,-if/h. 


If £ = Z aiX, — Z) b.jq, 

»-l 1-1 

(15) E(L) = a ^ at — p £ b,. 

•-1 .-1 

Since a,-, b t are i. of p., it follows from (11) and (15) that 


(16) 

Writing 

(17) 

(18) 


Z a. = Z &, = 1. 

t-1 i-l 

I. = a-. - a, *). = !/.- (3, 
= bun, 


E[(L - 5)’] = M Z ^ + r Z 6? = f/h* 

>-i i-i 

from (12); thus (14) may be written 

< 19 ) wLZa! + ,t6?T. 

L i-i i-i J 

From (18) we also have 

(20) f = + b\ 



BEHRENS-FIHHKR PROBLEM 


30 


where 

a 1 — h 1 jt, a*, b* — h 1 X 

i-i i-i 

are i. of p. 

4. Lemma. Wo propone to prove next that the maximum value of k is m, 
that is to say, it is impossible to obtain a t-distribution for a quotient (10) with 
moro than m — 1 degree's of freedom. For thin wo need a lemma to the effect 
that certain well known sufficient conditions for a quadratic form to have a 
x 2 -distribution arc; also necessary. 

Since under our assumptions h 1 (L — 5)*// 3 = x? and Q/f 2 = x*-i are inde¬ 
pendent, therefore Q*/f — xl, where 

Q* » h\L - i) s + Q. 

To shorten the notation, write 

x,, t = 1, 2 , • • > , m, 

Si = 

y { - m , i *» m + 1 , * • •, m + n, 

a ( = 7?(z,), f, ■* 2< — a<, “ -E(fJ). 

Let Q a X) QttSiSi) 

4, l 

where the indices s and f range from 1 to m -f n throughout. Then 5,1 is i. of p., 
and 

Q = X + 2 X) + «» 

where 

g. — X) gu«<! g = X g*«»- 

1 « 

From (17) 

h 5 (L — J) 3 = X P.if.fi, 

«if 

where p„ are i, of p. Putting q*i =■ q. t + p.i, g* are i. of p., and 

(21) Q* - X 9 * f.fi + 2 X g.f. + g. 

4.< « 

The moment-generating function of Q*// 3 is 

0(fl) = f?[exp (0Q*// 3 )] = Cl f* ■ • • [ +M exp ~ 5 X tf/ffi) II . 

J —00 J—■ «Q 4 A 

There exists a non-singular linear transformation from the f’s to v’a such that 



40 


henry scheff£ 


r m - e , 

8 * 

( 22 ) 2 #*< f»f< — 2 ^* y « • 

v ' *,< • 

Then 

( 23 ) 2?.r. = 2 p.«., 

0(0) = C , 2e® a//I II f exp [ — £[»* — 20 (X,i > 4 + 2p.y)// 1 ]} rfy 

I J-cO 

= £ ea,/I n (1 - 20X.//V exp | 2 fl^/(/* - 29XJ 5 )). 

4 

Now Q*/f = x* if and only if 

m = (1 - 2fl)-‘*. 

Hence 

(24) P. = 0, 2 = 0, 

and k of the X. must be equal to f while the remaining i, vanish. No generality 
is lost in assuming 

(25) Xi = X 2 = ■ ■ * = Xt — f 1 , X*+i =•••=? Xm+n = 0. 

Let Wi = fv,, i = 1,2, • • • , k. From equations (21) to (25) we deduce that 

k 

(26) Q* = Ejiir.fc = 2»f, 

»,t 1-1 

where q* t is i. of p., and the w, are linear combinations of the f, such that 

(27) Efwflvj) — f\j. 

That the conditions (26) are necessary* for Q*/f 2 — xl constitutes the desired 
lemma. 

fi. Maximum Number of Degrees of Freedom. We have seen that the Wi 
in (26) must be of the form 

(28) w, = 2 a >i £/ + 23 ^i/Vi ■ 

i-i i- 1 

We substitute (28) and (20) into (27) and write the result in matrix form, 

(29) nAA' + vBB> = (oV + b\)I k , 

where Ij is the identity matrix of order j, A kXm = (o„), B iXn = (b,,), and when¬ 
ever a new matrix is introduced, a superscript r Xc indicates that it has r rows 

4 We have incidentally proved sufficiency. 
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and c columns. Now if wo know that A .4' and HR' were. i. of p., then we could 
equate coefficients of p and v in (29) and immediately draw the desired eunelu- 
sion k g vi. But that. AA', BB r are i. of p. is not obvious, since this need not 
be true of A and B. However, we, do know that the matrices 

F ~ A'A, G — A'B, H = B'B 
are i. of p. because the. matrix (q*,) of (2(5) is 



Multiplying (29) on the left by A' and on the right by A, we obtain 
(30) pF i + vGG' = (ap + b\)F. 

(30) must hold identically in p, e. Since the coefficients of p, v are now i. ofp., 
we may equate them, hence GG' - b*F, Similarly multiplying (29) by B' and B, 
we get G'G = a 2 //. Now 6 for any matrix M, rank M = rank M'M — rank MM'. 
Thus rank F = rank II = r, say. Again, F — A'A, therefore r — rank A g rn, 
Since F is a positive 6 matrix, i, of p., there exists a non-singular P mXm , i. of p., 
such that 

(31) F - P'l*. r P = A'A, 

where I j, r is the j X j matrix the first r of whose diagonal elements are unity and 
all other elements zero. Let T lXm — A I 1 ' 1 . Then 


(32) 


A - TP, T'T = 7 m , r 


from (31). Likewise we can write 

(33) B - U kXn R nXn , U'U m I H , r) 


where R is non-singular and i. of p. Then G — A'B == P'T'l/R, hence T'U =» 
(PT'GR" is i. of p. We note 

T = (T kXr , 0 lx(m_r> ), V = (U kXr , o iXin " r> ), 

where 


Since 


T'iTx = ViUx = L . 



(T[ Vx 0\ 

\ o o) 


1 A simple proof [6b] of this useful theorem is the following; Let r ■» rank M, p ■ rank 
M'M. pgr since the rank of the product cannot exceed the rank of a factor. M contains 
r independent column vectors; the Grammian matrix of these vectors is non-singular and 
appears as an r X r minor in M'M. Honce p t r. Furthermore, all principal minors of 
M'M are Grammian matrices (which always have non-negative determinants), hence M'M 
is always positive-we use this below. 
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is i. of p., so is its minor F rXr = T[L\ . 

Write 

* 

/P[ Xm \ 


/Rr \ 

P ~ l p(m-r)Xm J ’ 

R = 

V#*"' 1 * 7 

Then from (32), (33), 

(34) A = T^Pi , 

B = 

VyRy. 


Substituting (34) in (29), we get 

(35) pT\P\P[T[ + vUiR x R[u[ = + b\)h , 

and multiplying by T[ on the left, Ti on the right, 

hPyP'y + vVRiRiV' = (aV + h\)l r . 

Again the coefficients of n, v are i. of p., so 

PiP[ = a s J r , 

(36) VRxR[V' = tfh . 

Similarly we find 

(37) RiR'i = . 

From (36), (37), VV' = / r . (35) now becomes 

(38) a\TJ'[ + b\UiU[ = (aV + & 2 r)/ t . 

Multiplication of (38) on the right by Uy gives 

a\T k V + b\L\ = (oV + 6V)t7i. 

Hence TiF = Ui, therefore C/iE/( = ITiT 7 !, and putting this back into (38) 
we have I k = 7\T(, rank /* = rank TiT[ ~ rank T[Ty = rank l r t k = r £ m, 

6. Minimum Expected Length of Confidence Intervals. We now point out 
that of all confidence intervals (13) with k = m, the confidence intervals (9) have 
the minimum expected length. Recalling that the a ,, hi in (19) are subject 
to the conditions (16), we easily find 

(39) Z m ^ 1/m, Z £ 1/m 

»*1 i—I 

From (39) and (19) we have 

E(l) § + (m/n)r]*/m*, 

and referring to the statement at the end of section 2, the property of (9) asserted 
above is now obvious. 

7. Asymptotic Shortness of Confidence Intervals. In conclusion we wish 
to compare our results with the case where the ratio of the variances, 9 = v/u 
is known. If ' 
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& = £(*.- «*, s v = i; (!/.- - y) j , 

i-i ,-i 

L ~ £ — y, o\ = (a/?») + (t'/n), 

then (L — h)fm,, R t /y, S„/v are mutually independently distributed, the first 
normally with zero mean and unit variance, and E t /y - Xm-i, E v /v = x»-i • 
Hence 

(L - t>){(ym~' + m~% SV* + V‘)/(» + n - 2))" } - . 


TABLE I 

Values of R for t = .95 


\ n — 1 J 







5 

10 

20 

40 

f 03 

m— 1 






5 

1.15 

1.20 

1.23 



10 


1.05 

1.07 



20 



1.03 


I 

40 





■ 


TABLE II 

Taffies of R for e = .99 


n — 1 

5 

10 

20 

40 

CO 







5 

1.27 

1.36 

1.42 


1.52 

10 



1.13 


1.20 

20 



1.05 

1 

HS 

40 





MEM 


This relation yields the confidence intervals 

(40) | S - L | £ + n - 2)” i (w“ 1 + ‘)‘(& + SvM 

where the confidence coefficient is again «. The confidence intervals (40) are 
known to be highly efficient; for instance they are of the shortest unbiased 
type [5a]. We calculate their expected length to be 

E(l) - + (ro/n>j*/m‘, 

The ratio R of E(l) for (9) to Elf) for (40) is thus 

(41) R — (fm—l.«Cfli_l)/(fm+n—2 ( eC,n+n— 2 ). 
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As k -> ®, c* -> 2, /*,« -* t a , t , hence as m -> », ft -> 1 no matter what the 
values of n ^ m For small values of m the ratio of the i values in (41) is con¬ 
siderably >1, but this is partly offset by c K approaching its limiting value 2 
from below so that the ratio of the c’s is <1. The behaviour of R for finite in 
is indicated in Tables I and II. Table I (II) tells us for example that with 
m > 10, and e = .95 (99), the expected length of the confidence intervals (9) 
is at most 11 per cent (20%) longer than that of the optimum confidence inter¬ 
vals (40) available when the ratio 6 is known, While we may conclude from 
R -> 1 as m ->■ », that our solution (9) is asymptotically extremely efficient, we 
cannot conclude from Tables I, II that for small m (9) is inefficient, since we 
do not know what the lengthening effect of the extra nuisance parameter in the 
Behrens-Fisher problem would be on “best” confidence intervals. 

8. Summary. In the terminology of the first paragraphs of sections 1 and 3 
we have pioved that there do not exist a linear form L and a quadratic form Q 
in the observations such that the quotient (10) will have the /-distribution (for 
all values of the parameters) with more than m — 1 degrees of freedom, We 
have further shown that of all confidence intervals (13) based on the /-distribu¬ 
tion with m - 1 degrees of freedom, and with confidence coefficient «, (9) has 
the minimum expected length. The quantities needed to apply our solution 
(9) are given by (1), (7) and (8). Finally, by comparing this solution with a 
known highly efficient solution for the case when the ratio of the population 
variances is known, it has been possible to show that at least asymptotically 
our confidence intervals (9) are very short. 
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AN EXTENSION OF WILKS’ METHOD FOR SETTING TOLERANCE 

LIMITS 


Hy Aiiiuham Wald 
Columbia l'nivcrsdy 

1. Introduction. Let x lit* :t random varialde anti let /(. r) lx* its probability 
density function. Suppose that nothing is known about, /(x) except. that it is 
continuous. Let. Xi, •■ • , /« lie n independent observations on /. Thu prob¬ 
lem of setting tolerance limits can be formulated as follows: For some given 
positive values f) < 1 and y < 1 ire hair in construct two functions L{x \, ,x„) 

and d/(xi, ■ * > , /„), called tolerance limits , such that the probability that 


( 1 ) 



dl > y, 


holds , is equal to (3, This problem has recently been solved by K. S. Wilks 1 in 
a very satisfactoiy way when nothing is known about/(.r) except that it is con¬ 
tinuous. Wilks proposes the following solution: Let r!, • > • , r„ be the observed 
values of x arranged in order of increasing magnitude, Then L = jv and 
M - ,r n .,n where r denotes a positive integer. The exact sampling distribution 

■ f*l 

of the statistie / /(/) dl is derived by Wilks and this provides the solution 

for the problem of setting tolerance limits, A very important feature of Wilks' 

— r *• 1 

solution is the fact that the distribution of / fit) dl is entirely independent 

f*i\ m r M 

of the unknown density function fix), i.e. the distribution of I /(/) dt 

" x r 

is the flame for any arbitrary continuous density function fix). 

In this paper we shall give an extension of Wilks’ method to the multivariate 
case. Let xi, • * * , x,, be a set of p random variables with'the joint probability 
density function fix i, * • ■ , x p ). Suppose that nothing is known about 
fix i, ■ , x p ) except that it is a continuous function of ii, • • * , x„ . A sample 
of n independent observations is drawn and the a-tli observation on x, is denoted 
by ,r ta (i = 1, • • • ,p\ a = 1, ■ • ■ , n) The problem of setting tolerance limits 
for Xi, ••• , x„ can be formulated as follows: Far some given positive values 
P < 1 aiul y < 1 wc hair to construct p pairs of functions of the observa¬ 
tions Li(x n , • ■ • , x pn ) and M,(x u , • • • , x pn ) (i = p) such that the prob¬ 

ability that 


( 2 ) 



> hi) dli ' ‘ ‘ dtp y , 


1 S. S. Wilks, “Determination of sample sizes for Betting tolerance limits,” /lnnais of 
Math, Slat,, V ol. 12 (19-11). 
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holds, is equal to fi. The functions L, and M, are called the lower and upper 
tolerance limi ts of x { . A natural extension of Wilks' procedure would seem to 
be the following: Let x tl , ■ ■ , be the observations on x, arranged in order 
of increasing magnitude and let Li — X{, { and Mi ~ x„, (i = 1, ■ ■ • , p) where 
r, and s { denote some integers. However, this choice of the tolerance limits 
does not provide a satisfactory solution of our problem, since the distribution 
of (2) is not independent of the unknown density function f(x i, • ■ • , .t„). It 
will be shown in this paper that by a slight modification of the above procedure 
the distribution of (2) becomes entirely independent of the unknown density- 
function f(xi , • • • , x v ). In section 2 we will treat the bivariate case and in 
section 3 we will extend the results to multivariate distributions. 

I 

2. The bivariate case. In this section we deal with the case when p = 2. 
Let an,---, xiv. be the observations on x v arranged in order of increasing magni¬ 
tude, We may assume that x n < a ; 12 . < • • • < x Jn since the probability of an 
equality sign is equal to zero. We define 

(3) Li = x„ t and Mi = a,,, , 

where n and si denote some positive integers and r, < Si < n. Consider only 
those sample points (xi a , x ia ) for which x ir , < x, a < x Ul , i.e. consider 
the sample points (xi,r t +i, Xvi+i)> (aii.n-i, ^ 2 ,.^—x). Denote by 

® 2 ,n+i, • • , x 2lI ,_i the values x 2 , n+ i, • • * , arranged in order of increasing 
magnitude. We define 

(4) In - x' 2n and = x u , , 

where r 2 and s 2 denote some positive integers for which r 2 < s 2 < Sj — n — 1. 
We will show that the distribution of the statistic 

(®) Q = [ f f{t\, In) dti dh , 

■'ll •> n 

is entirely independent of the unknown density function f(xi, x 2 ). Denote by 
<p(x0 the marginal distribution of x x , i.e. 

/•+« 

( 6 ) ¥>(xi) =1 f{xi, x 2 ) dx 2 . 

J — U ) 

Furthermore denote by ip(x 2 , L t , M x ) the conditional distribution of x 2 calcu¬ 
lated under the condition that L\ < Xi < Mi . Hence 


( 7 ) 

Let 

( 8 ) 


2 , Li,Mi) = 


rMi 

J /(x i, Xi ) dxi 


■'i-i 

r + " r M i 


I /(xi, x t ) dx i < 
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and 

(9) P “ f ' HI, L u M x )dt 

Jl, 

From (5), (8) and (0) it, follows that 

(10) Q - r?. 

It is obvious that the distribution of V is given by Wilks’ formula. Since Wilks 
derived tho distribution only when s t ~ n — r t + 1, we will briefly give here the 
derivation for any integers r, and x,. 

f *tr J r*** 

dt « u and | vj(0 (It — v. Then the joint probability density 
function of u and v is given by 

(11) cu r, “*(l - u - e)*‘“ r| •' e" *' dudv, 

where c is a constant. "We obviously have P — 1 — u — v, The joint density 
function of P and u is given by 

(12) cu -v-i p , I ~r 1 .->(i _ u _ py-.i (ludP ' 

where u is restricted to the interval (0, 1 — /']. Hence the distribution of P 
is given by 

gpu-o-i j‘ F u n~i (1 _ u _ p y -.y du 

- cP " '"(I - FT 1 jT (j * p )"“'(l - J “ p )"" du 
= - py^ r ' jf l r’“ l (i - dT 

_ c'pn-ri-I^ _ py-> l+r,_ 

Since the integral of the density function of P over the range of P must be 
equal to 1, we find that 

d - r(n + l)/r(«i - n)T(n - s, + n + 1). 

Hence the probability density function of P is given by 


(13) 


r (n + 1) 

r(si — ri)r(« — si + Ti 4* i) 


P*‘ r ' ‘(i - p) n ‘' u ' dP. 


Since , • • • , can be. considered as — rj, — 1 independent observa¬ 
tions on a random variable t the distribution of which is given by t(t, Li , iWj) dt, 
for any given values Lx, Mi the conditional distribution of P is given by the 
expression we. obtain from (13) by substituting r 3 for n , s 3 for s ( and Si — n — 1 
for n. Hence the conditional distribution of P is given by 
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(14) 


_ r(ai — ri) _ (p)* 3 ~ rs-1 (l — dT\ 

r(s 2 — r 2 )r(si — n — s 2 + r 2 ) 


Since the expression (14) does not involve the quantities hi and Mi , P is distri¬ 
buted independently of h and My . Hence the joint density function of P 
and P is given by the product of (13) and (14), i.e. by 


(15) 


_ py-Ml+n Q5y,-n- _ P)., +rj ^p > 


where A denotes the product of the constant coefficients in (13) and (14), I* rom 

(15) it follows that the joint distribution of P and Q = PP is Riven by 

(16) A{1 - P)”~’ 1 +n Q* 2 ~ r,_1 (P - q )* 1- dP dQ. 


Since the range of P is the interval [Q, 1], the distribution of Q is given by 


(17) 


AQ‘ 


_I f (1 - P)"^‘ +ri (P - 

Jq 


Q) 


«i—fi— 


dP. 


Let R = P — Q. Then we have 

f (1 - P)"-'i+U(p _ Q ) ..-r J -l-. 1 +r, dp 
Jq 

(18) = [ (1 - Q - R) n ~‘ l+ri dR 

Jo 

= (1 - (1 - Q) [ (1 - T)”-' 1 * 1 dT. 


From (17) and (18) it follows that the probability density function of Q 
is given by 


(19) 


T(n 4- 1) 


r(s 2 - r 2 )r(n - s 2 4- r 2 + 1) 


Q ,J_r, ~ 1 ( 1 - C) , ’- ,I+r3 dQ. 


3. The multivariate case. We may assume that no two elements of the matrix 
II x, a || (t = 1, • • • , p; a = 1, • • , n) are equal, since the probability of this 
event is equal to 1. For each a let t„(a = 1, •••,») be the point with the 
coordinates Xi a , - • • , x pa . Let Xn , • * , %i n be the observations on Xy arranged 
in order of increasing magnitude Then Ly = x ln and My = x Ul . The quan¬ 
tities L, and M t (i — 2, ■ • ■ , p) are defined in the following manner: Let S be 
the set of all points t* for which 

hj < < Mj (j = 1 , ■ ■ • , i — 1 ), 

Arrange the i-th coordinates of the points in S in order of increasing magnitude. 
Then L,- is equal to the r,-th element and M. is equal to the s,-th element of this 
ordered sequence. We will derive the distribution of 

/ Mi 

f( x i) ,x p ) dx i dx p . 



( 20 ) 
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Let 

( 21 ) 


. *•« * ’r a > /*Wf 

<2. = / •*•/ / 
J-*6o *L**ec » 



• •, x p ) (hi • * • dx* 


(t« 1. 1), 


Denote by <?,(x,, Li, Mi , • • • , L, i, M, i) (t *•* 2, - ■ ■ , p) the conditional 
probability density function of x, calculated under the condition that 
L, < Xj < Mj {j ^ 1, • > • , i — 1). Let 


r •«t 

(22) Pi “ / ifii (x,', L, j M i j " ■ | L,_i, d/,’_i) (i.r,, 

•» ti 

We obviously have 

(23) Q,mi=QtPh. (t - 1, ••• , P ~ 1). 


We will prove that the probability density function of Q, is given by 


(24) 


r(S{ 


r<»+D 

r, 0 r(n. — s, + r, + 1 ) 


-‘(1 - Q<) n ~“+ r< dQi, 


(i 


I. '",?)< 


This is certainly true for t — 1,2. We, will assume that it is true for i — j and 
we will prove it for i — j + 1. It is easy to see that Q j and P J+ i are indepen¬ 
dently distributed and that the probability density function of P,+i is given by 


_ . r (8j -r,) . 

(25) r(«/+i - r, + i)r(fi, - r t ~s jfX + r, +l ) 

• (P; +1 ) , ' +, " r ' +, " 1 (l 


P, + i)''" r '" 1 “*’ +l+r ’ +l dP J+ i. 


The joint distribution of Qj and P, +l is of the same form as the joint distribu¬ 
tion of P and P in section 2. Hence the distribution of Q,P,+i can he obtained 
from the distribution of Q — Pi 5 by substituting r J+ i for n and s )+l for s 2 . The 
distribution of Q is given in (19). Making the above substitution in formula 
(19) we obtain formula (24) for t = j + 1. Hence the validity of (24) is proved 
for i = 1, 2, ■ • • , p. In particular, the distribution of Q p is given by 


(26) 


r(s. 


r (n + 1) _ 0 , p - 

r p )T{n — s P + r p + 1) 


"’(1 ~ dQ p 


It is interesting to note that the distribution of Q p does not depend on the 
integers n , «i, • ■ • , r p „i , -v-i. The construction of the tolerance limits 
Li, Mt (i = 1, ••• ,p), as proposed here, is somewhat asymmetric., since it 
depends on the order of the variates Xy, - • • , x P . In practical applications the 
asymmetry of the construction will be very slight, since in most practical cases 
the integers r p and s p will be chosen so that (s p — r p — l)/n will be near to 1. 
If, for example, (s„ — r p — 1 )/n > .95, the tolerance limits will be affected only 
very slightly by a permutation of the variates Xi , ■ • • , x„ . However, it would 
be desirable to find a construction which is entirely independent of the order of 
the variates X \, ■ • ■ , x p . 
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4. Tolerance regions composed of several rectangles. For the sake of sim¬ 
plicity we will consider here the bivariate case. All results obtained in this 
section can be extended without any difficulty to the multivariate case. 

In section 2 the tolerance region has’been a single lectangle in the plane Qa , a-..) 
determined by the four lines x t = Ly , xi = Mi ; xi = L% and xt = , If the 

variates %i and x 2 are strohgly correlated, a tolerance region of rectangle shape 
seems to be unfavorable, since it will cover an unnecessarily large area in the 
(a;i, Xi) plane. The situation is illustrated in figure 1 where the scatter of a 





bivanate sample of size n « 19 is shown. Suppose we choose n - 3, g t = 17- 

r* l a, 13, then the tolerance region T, as defined in section 2, will be the 
rectangle ^determined by the lines = Ti = ; « x * Ml - * liI7 ; 

2 a, a*,! , and x 2 ~ M t = x 2 ,i 3 . Now consider the tolerance region T' 

consistmg of 3 small rectangles R L , R t and R s defined as follows: 

The rectang e Ri is determined by the vertical lines through x,, and x, , and 
the horizontal lines through the sample points with smallest and largest ordinate 

SmtL° U Tr ar? t iCh IT absC1S3a values in the “ o?the 

interval , * a . 7 ]. Similarly R t M determined by the vertical lines through 
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xi ,7 and jtim and the horizontal lines through the sample points with largest 
and smallest ordinate, restricting ourselves to points with abscissa values in the 
inleiior of [.it,? , ti.h] Finally R 3 is determined by the vertical lines thiough 
ai,i 3 and jt,i» and the horizontal linos thiough the sample points with largest 
and smallest ordinate, lestrieting ourselves to points whose abscissa values lie in 
the inferior of [an, 13 , an »]. The region V consisting of the rectangles R ,, 
and Jfj has a much smaller area than the legion T. As we will see later, the prob¬ 
ability distribution of the statistic //;c. , xt)dri dx- is exactly the same as 

T' 


that of J J f(x i, xi)dxi dx 3 


Thus the use of T' may be preferred to that of T, 


We will consider tolerance regions T* of the following general shape: Let 
in i, • ■ • , 71 ik be k positive integers such that 1 < wii, wu < n and m 1+ i — m t > 3 
where n is the size of the bivariate sample. Let T r , be the vertical line in the 
(xi, .r 2 ) plane given by the equation an = an.m, (i — k). The number of 

sample points which lie between tlje vertical lines W< and V l+1 is obviously equal 
to «t 1+1 — in, — 1 . Through each point which lies between the vertical lines 
F; and l' lH i wo draw a horizontal line. In this way we obtain ?n,+i — m,- — 1 
horizontal lines JF,,i, • ■ • , where the line Tr,, J+ x is above the line 

IF,,;. Denote by It,/ (i — 1, • • • , k — 1; j = 1, • * • , m,>x — m, — 2 ) the 
rectangle determined by the lines V;, 7 I+1 , W,,j, Tr,,, +l . Let T* be a region 
composed of s different rectangles R ,,. The regions T and V in the example 
illustrated in figure 1 are special cases of the type of regions 9"* as described 
above. For the region T we have k = 2 , wu = 3, ni 2 — 17, a = 12 , and for the 
region T' we have A; = 4, vh — 1, wi 2 = 7, wj = 13, nu = 19 and s — 12. 

Let Q* be given by J j /(an, x-)dxi dx- . Wc will prove that the probability 

r- 

density function of Q* is given by 


(27) 


_r( n + lj_O*' -1 

r(s)T(n -s 4-1) 


(1 - Q*)"** dQ*. 


Let /,(x 5 ) dxi be the conditional distribution of an under the restriction that 
Xi, mi < x/ < xi,m 1+l . Thus, w-e have 


(28) 


/.(**) = 


r x i » w <+1 
r+” 7*i'"7+i 


/(an, xi) dxi 


f(xi , an) dxi dxi 


Denote by <p(xi) dx x the marginal distribution of an , i.e. 


<P& i) = 
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Let 

(29) 
and 

(30) 


P* = f 1,m ' + ‘ v>(xi) dx i (i ® 1, • ■ • , k - 1) 


?r = z f b " /.(**) ^ 

j 


(i= 1, ~ 1) 


where a.yis the ordinate of the lower corners and b tJ is tlu* ordinate of the upper 
comers of the rectangle R,j and the summation is to he, taken over all \ nines of j 
for which R tl is included in T* It is clear that 


(31) 


Q* = P*P* + ••• + PtiPti . 


Let y be any random variable which has a continuous probability density func¬ 
tion, say i{y) dy. Furthermore let & , • * • , y„ be n independent observations 
on y. Let i p t (y) dy be the conditional density function of y under the condition 
that y is restricted to the interval [y m , > t/w.+J- Let 


(32) 


l/m v + j + l 


r-Zj 

1,1 + i 


Mv) dy 


where the summation is taken over all pairs i, j for which R,, is contained in T* 
Let 


PVm l+1 

P. = I Mv) dy, 

:-£/ 

J v V 


and 


'!/«, + /+1 


HV) dy, 


where the summation is to be taken over all values j for which R,, is contained 
in T*. We obviously have 

( 33 ) P , = P[F{+ +PL 1 K-j. 

It is easy to verify that (i) the joint distribution of P[ , ■ ■ • , P*_i is the same as 
the joint distribution of P* , ■ ■ ■ , P*_i ; (ii) the distribution of P[ is the sumo 
as that of P* (i = 1 , • • ■ , k — 1); (iii) the variates P[ , • , arc indepen¬ 
dent of each other and also of P[ , - • , ; (iv) the variates Pf , • ■ ■ , Pt-i 

are independent of each other and also of P* , ■ ■ , P*_i. Hence it follows fiom 
(31) and (33) that the distribution of Q* is the same as that of P'. Now we will 
derive the distribution of P\ The expression P' can be written in the following 
form: 

P' = £ f'' My) dy, 

1 = 3 Jl/r 


(31) 
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where i\ , .s'i , • • • , n, .Sj an* Mime positive integers far uhich 1 < n < < r 2 < 

St < • • * < )■; < .S| < n. Let 


U-l f V *. 

L 3 ^ Z / Ulpdy ■h 


f (’/) rf'/ 


v? /■**. ,, . , P'f-r'r-a 

Z f <£(?/)rfy -I- / 

I-l -V, *br,_ | 


<A(y) dy. 


For any fixed value y,,.., denote 1 >y My) the conditional probability density of y 
under the restriction that y < y,,and by My) the conditional distribution of y 
under the restriction that y > . Let 

r v *i—i Ld r»*. 

P=\ m d y Pi~Z M!/)dy; 

J-ac i-l ^Vr i 

Pi - Ml/) rfy and P 3 = / My) «y. 

J wr, 

Then it- follows trom (3 1) and (35) that 


h(u) «(/■ 


r-rr, + U-m, 

P" . PP, + (1 - P)P, . 

For calculating the distributions of P 2 and P» we may consider the. variates 
y,,_, n i 1 • , i/n as n — «/ i independent observations drawn from a population 
which has the distribution M;/) dy. lienee, the distribution of P 2 ran be de¬ 
rived from (13) and it is easy to verify that the distribution of P 3 is the same as 
that of Pi. It is clear that P 2 is independent of P and Pi . Similarly P 3 is 
independent of P and Pi . Hence, because of (3G) the distribution of P' must 
he tlio snipe as that of P". 

In the same way we find that the distribution of P" is the same as the distribu¬ 
tion of 

Ld r y, < r<'*i-i +, i-i + 'i-o- , i-i 

P'" = Z I Kv) <ly + Hv) dy 

1“*1 M/r, 


Thus, by induction we see that the distribution of P' is tin* same a- the distrihu- 

/ Ur | u J 

lA(y) dy where s — Z (*» ~ >’,). From (13) 

Jr t I“1 

it follows that the distribution of P 0 is given by 

ff.)^-^v+i) «" (l 

Hence, we have proved that the distribution of Q* is given by (27). 
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5. Summary of the results and numerical illustrations. I shall Rive Imre a 
summary of the results obtained and a few illustrative examples, I ho multi¬ 
variate case being a straightforward extension of the bivariate ease, I shall 
discuss merely the latter Consider a pair of random variables x and y. 1 )enote 
by f(x, y)dxdy the joint probability density function of x and y and suppose 
that nothing is known about f(x, y) except that it is continuous. A sample of 
n pairs of independent observations (xi , yi), ■ • 1 , (.c„ , y n ) is drawn from this bi¬ 
variate population, The sample can be represented by n points ]h , ■ • • , p n 
in the plane (x, y), p, being the point with the coordinates i\ and y ,. In section 2 
we have dealt with the problem of finding a lectangle T in the plane (j*. y), 
called tolerance region, such that we can state with high probability, say with 
probability .98 or .99, that the proportion Q of the bivariate universe included 
in the rectangle T is not less than a given number 6, say not less than ,98 or .99, 
The rectangle T is constructed as follows: Suppose that the points pi ,p„ 
are arranged in order of increasing magnitude of their abscissa values, i.e. 
xi < i 2 < • ■ • < x n . We draw a vertical line V,, through the point p r , and a 
vertical line 7,, through p n where ri and si are positive integers such that 
1 < rx, n < si — 3 and si < n. We consider the set S consisting of the points 
Pn+ i , ■ ■ • Pn-i which lie between the vertical lines 7 r , and V H , We draw a 
horizontal line H ri through the point of £ which has the r 2 -th smallest ordinato 
in S. Finally a horizontal line H, t is drawn through the point of .S' which has 
the s 2 -th smallest ordinate in S. The values r 2 and s 2 are positive integers for 
which r 2 < sj. The tolerance region T is the rectangle determined by the lines 
Fri, 7,, , H , 3 and H n . The probability p that at least the porportion 
b(0 < b < 1) of the universe is included in T is given by 


(37) 



_ r(n + 1) _ 

r(s 2 - r 2 )r(a — jj + n+ i) 


Q .,-r a -i (1 _ Q)"-.»+n fiR 


It is known that if a random variable e(0 < v < 1) has the distribution 


T{c + d ) 
r(c)r(d) 


v c ~\l - v)*- 1 dv, 


and 2c and 2 d are positive integers, then — ——— has the F-distribution (analysis 
of variance distribution) with 2d and 2c degrees of freedom. Thus, 


(39) 


2(s z - n) l - q 


= F 


2(n — s 2 4- r 2 + 1) Q 

has the F -distribution with 2(n - s 2 + r 2 + 1) and 2(s 2 - r 2 ) degrees of froe- 
From (37) it follows that p is equal to the probability that 


dom. 


F < 


2(s z - r 2 ) 1 


2(n — St -f- r 2 1) b 


where F has the analysis of variance distribution with 2 (n — s 2 + r 2 + 1) and 
2(s 2 - r 2 ) degrees of freedom. For the case r, = 1, = n, r 2 = 1 and s 2 - 
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n~2, the following table gives the value of the sample size n which is necessary 
for having the probability p that at least the proportion b of the universe is 
included in the tolerance rectangle T, 



5 *» ,97 

| b * ,975 

h *> .98 

b =» ,985 

b » .99 

p = .90 

332 

| 398 

499 

668 

1001 

p =“ ,95 

— 

mm 

385 

515 

771 


Thus, if we want the probability to be .99 that the tolerance region will include 
at least 98 per cent of the universe, the sample size must be 499. 

In section 4 tolerance regions are considered which are composed of several 
rectangles. Such a tolerance region may be more favorable than a single rec¬ 
tangle if x and y are highly correlated. As an illustration we consider tolerance 
regions T* constructed as follows: Suppose that n is divisible by 4 and the sample 
points Pi, • .< ,Pn are arranged in order of increasing magnitude of their abscissa 
values. We draw the vertical lines Vo, Vi, V*, Vj and V< through the points 
Pi j P«/i, P»/i j Pj«/< and p* • hot R,(i - 1, 2,3,4) he the rectangle determined 
by the vertical lines V,-_i and Vi and the horizontal lines Hi and //, where Hi 
and Hi are defined as follows: consider only the points which lie between the 
two vertical lines V,_i and V< (points on the vertical lines are excluded), From 
these select the point with the smallest and the point with the largest ordinate. 
The lines and ll[ are the horizontal lines which go through these two points 
respectively. The tolerance region 7* is composed of the four rectangles Ri , 
Ri , fta and Ri . The number of rectangles Ri, (defined in section 4) included in 
T* is equal to & = n - 9. Thus, according to the results of section 4 the prob¬ 
ability distribution of the proportion Q* of the universe included in the region 
7 1 * is given by 

-«*>•**■ 

Numerical calculations show that for n = 1000 the probability is ,99 that at least 
98.1 per cent of the universe will be included in the tolerance region T*. 







ASYMPTOTIC FORMULAS FOR SIGNIFICANCE LEVELS OF CERTAIN 

distributions 

By Alfred M. Peiser 
Cornell University 

1. Introduction. The purpose of this paper is to derive asymptotic formulas 
for the significance levels, or per cent points, of certain well-known statistical 
distributions. 1 Although we restrict ourselves here to two distributions, those 
of Chi-Square and of Student’s f, it will be apparent that the methods used are 
applicable to many other distributions as well. 

The following results are obtained. Let y T be the p per cent point of the 
normal distribution, that is, the distribution defined by 

(U> 

so that 

(1.2) Hy P ) = 1 ~ V- 

If Xp,« and t f , n denote the p per cent points of the Chi-Square and Student’s t 
distributions with n degrees of freedom respectively, then 

(1.3) x’,.. - n + y, Vto + | It’. ~ 1) + + » (^'-) ■ 

and 

(, 4) + ^ + 

These formulas approximate the true values of Xp,» and t„,„ to a high degree of 
accuracy. Tables of comparative values for several values of p and n arc given 
in Section 4. 

We shall need the following theorem due to Cramdr [3, p. 81; see also pp. 
86-87]. 

Theorem 1: Let Xi, X 2 , • • • be a sequence of independent, identically distrib¬ 
uted random variables having an absolutely continuous distribution function with 
mean value zero, dispersion a and finite fifth absolute moment. Let //„(x) be the 
distribution function of (Xi + • • ■ -f X n )/(a\/n), and let denote the 

r-th semi-mvanant of H„(x). Then 

(1.5) *(x) - H n (x) = * $«\x) - &« {x) _ WXS #W(iB) + 0(n -my 

6 'Vn 41 n oln 


1 This problem was proposed to the author by J, H Curtiss. 

56 



AhYMPTOTIP FOUMl'I.A.S 


57 


2. The Chi-Square distribution. A random variable X is said to be distrib¬ 
uted according to Chi-Square* with n degrees of freedom (A' = xJ if its distribu¬ 
tion function is 


( 2 . 1 ) 


FJi) 


I 


z i ln ~ l e~ u 


dt, 


2VO 

0 


x ^ 0, 

x < 0. 


The variable (x, — n)/\ ‘in then has the distribution function 


( 2 - 2 ) 


H n (x) = F„(n 4- x\/2n). 


If we write. 


(2.3) x\,n = n + yp-s/in 4- a n , 

so that 


(2.4) 


^‘"(xe.n) 1 V> 


and let z vn — i/p 4- a n /\/2n, then f{„(z pn ) = 1 — p, and it follows from (1,1) 
and (1,2) that 


(2.5) <I>( 2 „„) - //,(«*,) - 4»(«„-) - *Qf„) « 1 .... 

V 2x v2u 

where 0 < 0 < 1. Then by a theorem of Liapounoff’s (3, p. 77], 


| O- n | S (+aa«/-v , ’2 « ) ' F log a 
V2a ~ \f n 


where K denotes a constant independent of n. Hut if lim | a„ \/\/'2n = 
then lim H n (z fn ) is either 0 or 1. Hence o„ = o(x/n). 

n—»oo 

Fisher [1, p. 81] has suggested the use of 


Xp.» - Wjv + a/2 n - if- 


A closer approximation, 


Xji.n 


S*3 )l 



has been obtained by Wilson and Hilferty [2]. It is interesting to note Unit, 
according to (1.3), this last approximation is correct to terms of the zero-til 
order in n. 

We apply Theorem 1 to the variables X, = (xi — l)/\/2, j = 1, 2, * • • . 
Then a = 1, and, by the additive property of the Chi-Square distribution 
[3, p. 45], H n {x), the distribution function of the variable (Xi + ■ ■ • 4 X n )/\/n, 
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is related to F n (x) by (2.2). Thus, h = 2\/2. It follows from (1.5) and 
(2.3) that 


(2-6) 


lim y/2n l$(z pn ) - H n (z pn )} = lim —= 

"-*» 3 V 2 ?r 


2 <4. - i)«“’■" 




since a„ = o(\/n). Then by (2.5) and (2.6) 

lim a n = - 1). 

n ~*oo 

According to (2.3) we may now write 

Xp. 5» = 2?i + 2 y p y/n + 2r p + 2 b„ , 

where 

(2,7) r r = Kl/p - 1), 

and b n = o(l). A simple change of variables in (2.1) yields 

M [, + £ + g]~ *. 

If we let 


(2.9) 

then 


/•M-jL i 

J„ = / l 

J Vr V2 tt 


(2-10) nj n = h <*,+«.) i 

\/2f 

where <5„ = o(l). By (1.2) and (2.4), 


( 2 . 11 ) 



^2n(Xp,2n). 


Using Stirling’s formula for T(« + 1) in (2.8), (2.11) becomes 
J ~ ~ [“ p {' is+ 
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wb m A.-U-' - 

n\ 12 


a i? 
2 ' 2 


r a , V , 3 

r p — tt + o + v r p 


0 + ^»(ii -■■■-"■.) + AW. 

n/„(y) being dominated by P(j v |), where P is a polynomial in v independent of 
n, and f n (v) -- 0 (n" m ). Then 

J ' “ L‘- nW [“ & - T ’ - 2 + (' + 2r ’ + \ 

v** 


(2.i2) _e( 1 i + r 3 -) + .» i .]* + / 

v» 




V2m l3 

l»* /” s/' 


v — i>r„ 


do 


+r..c s .w*+r. (£#)*, 

. w V 2** Jv p + \Z2 tt \)-« jl / 

v " 


'v,+ b “ V2n 

V" 


where g„(v) has the same properties given above for/„(c). If we call these last 
integrals Ki, K», K } and fv< respectively, we nee that 


(2.13) 


lim nKs 




-l* 1 


, lim ng„(y) dv « 0. 

’v, V2ir *—» 


Also, since An , j ~ 1, 2, • • • is dominated by P,(\ v |), Pj(v) being a poly¬ 
nomial in v independent of n, we she that 


±1 


e~ u ' \A 


i J v P +-% V2r j\ 

V n 


/1 3 




where Qj is a polynomial. Since this lust sum converges, we have 


(2.14) 


nKi 


±! 

j"3 


' !lt nAl 


i "dv, 

»,+ * V2t jd 

V * 


and by the uniform convergence of (2.14), 

(2.15) lim nKt — X) f lim n/l’„ dv - 0, 

n-*w j-3 j'!y2!r n~*w 


since = 0(n //s ). 

Integrating by parts wo obtain 


nKi 


y/ 2x 


r (J 


ypb n + 



> 


lim nX s = 0. 


and since b n = o(l), 
(2.16) 
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Further integration by parts and the use of (2.7) yields 
(2.17) lim nKi = {y\ - 7</„). 

n-*» 3 ov2tt 

Then, by (2.10), (2.12), (2.13), (2.15), (2.10) and (2.17), 

lim hnVtt = ( V*p ~ 7y P ), 

n-*w 

so that 

= 2n + 2y P V~n + | <l£ - 1) + + 0 (^-) • 

Equation (1.3) now follows at once. 


3. Student’s t. If the random variable F„ has the distribution function 
i(i/Vn), then t n — Y n /\n is distributed according to Student's distribution 
for n degrees of freedom and has the distribution function 


e?„(x) = f 

oo 


1 rft(n + 1 )] / 

y/ nir r[^n] \ 


,t\-J(n+U 


1 -j“ 


nj 


dt. 


If or = y/n/(n — 2), the variable l n ju then has the distribution function 


(3.1) H n (x) = <?„(**). 

If we write 


(3.2) < p , n = y v + a„ , 

so that 

( 3 - 3 ) <?„(<„,„) = 1 - p, 

and let z pn = fp.n/v, then H n (z p „) = 1 — p, and it follows from (1.1) and (1.2) 
that 


d’( 2 j m ) H n (z pn ) — $(z pn ) — ${yp) 


(3.4) 


- ^b[-0 - 0+c- o-)-)T. 

where 0 < B < 1. Then by Liapounoff’s Theorem [3, p. 77], 

y p (- - + 1" g -l["P+ 9 (vp(-‘-0 +o ")] 2 < K lo g n 

\a J a " Vll ’ 

where K denotes a constant independent of n. But if lim ] a „ \ = «o, then 
lim H n (z pn ) is either 0 or 1. n 

n-*« 

Hence a„ = o(l). 
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We apply Theorem 1 to the variables X, - F n / X „, j - 1, 2, • ■ • . Then 
a = \Zr/( n ~ 2 )» and by the additive property of the normal distribution, 
7/»(x), the distribution function of (X l + ■■■ + X n )/(aVn), satisfies the 
relation (3.1), Thus X, ■= 0 and At = 8n/(n - 4). It follows from (1.5) and 
(3.2) that 

lim n[4>(z p „) - //„(z P „)] = lim - n T - 7 __ (z\ n - 3z pn )e~''^ 

«-» 4(n - 4)V2 t r 

(3.5) 

" 4VK <*■» - 

since a„ = o(l). By (3.4) and (3.5) we have 
lim n [y P ~ l) -f ~ 

n-+« L_ \(T / ff 


= y p 

4 


But lim n(l — a)/ff = —1, so that 


Urn n a ft = M* +?>. 
n-»eo 4 


Hence 


a» 


2/p + J/p 


4n 


+ 




and equation (1.4) follows at once. 


4. Tables. The following tables compare the true values of Xp ,„ and / p .„ 
with those obtained from (1.3) and (1.4). The true values [4], [5], (to three 
decimal places) are shown in ilalics. 


TABLE 1 


s 

Xp,n 








n 

.01 

.05 

.1 

.5 

.9 

10 

23.253 

18.318 

15.989 

9.333 

4.875 

28.209 

18.307 

15.987 

9.342 

4.885 

30 

50,908 

43.777 

40.257 

29.333 

20.600 

50.892 

43.773 

40.256 

29.336 

20.599 

50 


67,507 

03.168 

49.333 

37.689 

B 

67.505 

63.167 

49.335 

87.689 

100 

135.811 

124.343 

118.499 

99.333 

82.358 

185.807 

124.342 

118.498 

99.384 

82.858 
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TABLE II 


'Pi* 


>s 'v V 





M 

71 \ 

.0125 

,025 

.05 


; ; . 

n \ 





1 

If ) 

2,579 

2.197 

1.797 

Eg 

R9 

lu 

2,634 

2 M 

1,813 

HI 

mm 

30 

2,354 

2.039 

1.696 

1.171 

0.683 

2,360 

2.042 

1,697 

1,173 

0,683 

60 

2.298 

2.000 


mm 


2 299 

2,000 

\ 1,671 

S3 


120 

2.270 

1.980 

1.658 

1.156 

0.677 

2,210 

1,980 

1 668 

1 M 

0,677 
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GENERALIZATION OF POINCARE’S FORMULA IN THE THEORY 

OF PROBABILITY 


Bv ICai Lai Chung 
Tsing Ilua University , Kunming, China 


Let p( ffl )(l, ■ • ■ , a), (0 < in < n) denote the probability of the occurrence of 
exactly m events among the n arbitrary events E t , • *• , E n ; and p n (l, • * ■ , n ) 
(1 < m < n) that of at least m. Let p, r .,, (I < i < n), where (m ••• vi) 
is a combination (without re?pefcition) out of (1, ■ • * , n), denote the probability 
of the occurrence of E n , • • • , E, t (without regard to the other events); and 

‘So I, *Sc ^ 

("i •••'<) 


where the summation extends to all the combinations with i members out of 
(1, ■ • • , n). 

Then I’oincard’s formula may be written as follows: 

Ptoid, ■■■,#) = i (-1 yst. 

1-3 

An equivalent formula is: 

pid, •••,») -s (-ir l &. 

i—i 


The following conventions concerning the binomial coefficients are made: 



if a < I) or 6 < 0. 


Two generalizations, possibly due to do Mises, are 

pw(l, ■■■,»)- t (-1) 1<_W> ff) S, ; 

i —m \»*v 

We notice that the probabilities appearing on the left-hand sides of these 
formulas are symmetrical with respect to the set of suffixes (1, * • * , n), and the 
sums on the right-hand sides are symmetrical in the same way. 

As a natural generalization let us consider a probability which is symmetrical 
with respect to certain sub-sets of (1. ■ • • , a). We divide the n events into r 
sets: 


E 


♦'ll > 


75 * E 


E, 


►'an* 


; E 


*W > 


,E 


J >r\ 


where n\ -f- ?j 2 -j- • • ■ + n r — n . And we ask for the probability that out of the 
first set of rtj events exactly mi events occur; and out of the second set of rh. events 
exactly m 2 events occur; and so on; and finally, out of the rth set of n P events 
exactly m r events oceur. When this problem is solved the analogous problem 

03 
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in which we replace some of the words “exactly” by “at least,” can also be 
solved. 

We denote the required'probability by the left-hand side of the following 
generalized Poincco e’s formula \ 


PfmiMmjl,. ■ '.rwrl (’'ll * * ‘ rin, , C21 * * * F2n 2 j ‘ ‘ ' t I'rl ’ * * ’rn r ) 

n] T12 r» r 

a) «££•■■£ (—i)’ i+ ' a ...— 


H—>ffl| \2< n >V\z 1 r o“T71 r 


where 


(wi)(»fe) * (»tr) 

I, 2p “1• “1 °)1" ■ “1 ■!' «n" “ri r 1 


the summation extending to all those combinations of a’s such that for every 
k = 1, • • ■ , r, (au ■ ■ a*,*) is a combination of i k members out of 
Proof: Let p [rr denote the probability of the occurrence of the events 
E n , • ■ , E tl and these only out of E x , ■ ■ ■ , E n , It is well-known and also 
easily seen that 


V«1 • -Cia 


2 —/ Pi«»i • * *“0^1 • 
b -0 0 


fib) 


where for a fixed h the second summation extends to all the combinations 
(0i • 00 of b members out of the “difference set” (1, • • ,») — («,•■■ a „). 

Now let each p in each S on the right-hand side of (1) be decomposed into a 
sum of the p[„,. ,,]'s in the last-written way. Consider a fixed 


’Mrl •■Hr/ r ] ) 

• ‘ i r, (m* » • • • , wa) is a combination of j\ members out 

terms in 


whei e for every k = 1, 
of (vkl••• r*n k ). It appears once in exactly P.*Y^ S 


>' 2 » i r ■ Hence, its total contribution to the right-hand side of (1) is 

nj na 

£ £ 


n t 

\ P —m f 

•(Mf 4 

\Wi / \rm 

-nMJ 1 

i-i \m i/ , 0 


(£)(*)© 


if jk = m k 
if otherwise 


if jk = m* for every & = 1, • ■. j r 
otherwise. 
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Therefore after the decompositions atul the collecting of terras, the only p’s 
remaining on the right-hand side of (1) are those in which for every k = 
1, ■■■ ,j we have;* * ?«*. Thus the right-hand side is reduced to 




where the summation extends to all those combinations of n 's such that for 
every k = 1, ,r, (m p*«*) is a combination of m* members out of 

(eu ■ ■ ■ nnj. This is clearly equal to the left-hand side of (1). Q. JE, D. 

If we replace "exactly m*” by "at least m*” in the definition of the probability 
just considered, we replace in our notation the square-bracketed [m*] by an un- 

bracketed m* and we replace in our formula ^ by This is proved 

as before, noting that we have 


Z (—l)'* -1 ”* 


>*“«* 


ik ~ 1 
rn k -1 


1 


for; 1 * = m*, ••',»*; 


and identity which can be proved by induction on ;'*. 

A parallel generalization of Poincare’s formula is as follows: We ask for the 
probability that either out of the first set exactly m events occur; or out of the 
second exactly mi; ■ • • ; or finally, out of the rth set exactly m,. That is, in¬ 
stead of repeated conjunctions we may consider repeated disjunctions. Wo 
denote the required probability by the left hand side of (2), then it is given in 
terms of the p’s defined above in (I) by the right-hand side below: 


(2) IFI,-,-,(m r ](yu j ' ' * j *0ni ! •'il • ” Part, ", ' ‘ ‘ J Vrl ' ’ ' VrH f ) 


— pm\ • * ■, m r *H * 

Other events symmetrical with respect to each of the sub-sets, in whose def¬ 
inition the words "and”, "or”, "exactly”, "at least” appear arbitrarily, may be 
considered. 

Lastly, we only mention that as a first application the formula (1) can be used 
to establish the formula 

(n — /c)2p m (vi • ■ • Vk) = (k + 1 — m)'L , p n (vi * ■ ■ v*+i) + fti2p m+ i(vi > • * v* + i), 

% 

first obtained by P. L. Hsu. Tor its significance we may refer to [1], and a con¬ 
tinuation of that paper to be published shortly. 

REFERENCE 

[1] K. L. Chung, "On the probability of the occurrence of loast m events among n arbi¬ 
trary events,” Annats of Math. Slat,, Vol. 12 (September 1941). 



TABLES FOR TESTING RANDOMNESS OF GROUPING 
IN A SEQUENCE OF ALTERNATIVES 

By Frieda S. Swbd and C. Eisexhuit 
University of Wisconsin 

When two diffeient kinds of objects are arranged along a line they will form 
two or more distinct groups of like objects. Thus, in the arrangement: anbbbnh, 
there are 3 a’s arid 4 b’s forming 4 groups. In general, if there are m objects 
of one kind and n objects of another kind, there are m all 

C tn+n _ pm+n 
Pi ‘ n 

different arrangements possible, There will be no loss of generality if we assume 
that m < n 

If u is defined to be the number o f distinct groups of like objects in any one 
arrangement, then the proportion of arrangements yielding u’ or less groups is 1 


a) 

P{u < u' 

^ sym+n 

m 

ti-2 

where 





fu = 

when u 

= 2k, i.e. u is even, 

and 




fu « 

= c'ti + ctt-ci: 

i, when 

u ~ 2 k — 1, i.p. u is odd 


for k = 1, 2, ■ • • , m + 1. For example, if in = n = 5, then 


PIm = 21 = ll -21CJ.CJ} _ 1 
C\° C\° 120’ 

pi,,_pi _ fs _ Ci Co ~h Cp C'j 8 

Cj° cl° ~ 252' 


In a random anangement (1) is the probability of u < u'. 

The following tables have been prepared for use in testing data for random¬ 
ness and for testing whether two samples are from the same population. Table 

1 gives P{u < u \ to 7 decimal places for m n ff 20 with a range of m from 

2 to 20 inclusive whereas Table II gives correct values for u, for e = .005, .01, 
.025, .05, 95, .975, .99 and 995, where u, is the largest integer, u\ for which 
? ' w - u> \ ~ f when e < and is the smallest integer, u\ for which P[u < u 1 ) 
- ( when € > 50 This table w as obtained from Table I and covers the same 


*wl ^ ° f Groups m a Sequence of Alternatives” (Annals of 

Eugenics, Vo], IX, Part I (1039) pp 10-17) 

A Wald and J Wollomtz, “On a Test Whether Two Samples are from the Same Popu¬ 
lation (Annals of Malh Slat , Vol XI, No 2, June (1040J pp 147-162) P 
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range of value's of rn and n. Table III gives values of u, for m - n from 10 to 
100. Those values of u, were ohtainetl by using the normal approximation given 
on page 151 of the WahMYolfowifz paper together with a correction for con¬ 
tinuity not given in their article this correction improved the approximation 
for small values of m and n. The values of u, for m - n * 10 through 20 are 
included in Table III although they ran he obtained from Table II in order to 
check on the adequacy of the approximation. These values obtained with the 
approximation cheek with those of Table II except for the five underscored 
values. It appears that the approximation will be adequate in general for 
m = n > 20. 

To illustrate the use, of these tables to test randomness of an arrangement, s 
consider a ease where one might suspect nonrandomness and, more specifically, 
expect too few groups. The arrangement of diseased and healthy plants in a 
row of a field might be such a ease. For example, we might have the following 
plant arrangement: 

H IIIIIIIIIIIIIIIIII Dll I) D I) I) IIIIIIIIIIIIIIIIII, 

where 

m — 5, the number of diseased plants present, 
n — 20, the number of healthy plants present, 
v! ~ 5, the number of groups actually formed. 

From Table I the probability associated with this arrangement is found to be 
,018,3512, which is the probability of u < u'. Since P < .05, we might elect 
to regard this as evidence of a tendency for the disease to he nonrandomly 
distributed among the plants in a row, knowing that if we look for an explana¬ 
tion of "clustering” whenever P{u < u') < .05 we may expect to follow a false 
scent not more than one time in twenty in the long run. 

When a control chart’ suggests the presence of assignable causes of variation 
in a manufactured product flowing from a production line, an examination of 
various types of runs, e.g. the lengths and relative frequency of runs above and 
below the median of a sequence of values, may assist in diagnosing the nature 
of the cause. Dr. Walter A. Shewhart has given us such an instance: A se¬ 
quence of observations dealing with corrosion suggested the presence of an 
assignable cause of variation. By the use of run charts an assignable cause of 
variation was tracked down in the measuring apparatus and an attempt was 
made to eliminate it. The original sequence examined with regard to runs 
above and below the median of the sequence exhibited an unexpectedly large 
number of runs of length 7 or more and as a result a significantly low value of 

* W. L. Stevens (ibid). 

! American Defense Emergency Standards 551.1 and Z1.2 entitled “Guide foi Quality 
Control” and "Control Chart Method of Analyzing Data” and American War Standard 
Z1 3 entitled "Control Chart Method of Controlling Quality During Production” (pub¬ 
lished by the American Standards Association, New York City) 
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u, and, if the assignable cause were not completely eliminated in the new design, 
we might expect too large a proportion of long runs above and below the median, 
and, hence, too few total runs. A sequence of 40 observations taken with the 
new measuring device yielded a total of 15 runs above and below the median of 
the sequence which is significantly fewer than would be expected to arise under 
a state of statistical control, since for m = n = 20, P{u < 15) = ,038. This 
sequence is of special interest since the occurrence of too few runs suggested 
the assignable cause had not been, entirely eliminated although no especially 
long runs, say of length 7 or more, occurred in this sequence, so that from the 
point of view of length of runs without regard to their number the assignable 
cause might have been judged to have been eliminated. 

As an instance where too many groups would be the probable alternative to 
randomness consider the arrangement of occupied and unoccupied seats at a 
lunch counter about half an hour before the popular lunch hour begins. In 
such a case the critical region would be u > u' and the appropriate probability 
would be P = 1 — P[u < u' — 1}. Such a situation was observed and yielded 
the following arrangement of empty and occupied seats along the lunch counter: 

EOEEOEEEOEEEOEOE, 

fra = 5, 
n = 11, 
u' = 11, 

P = 1 - .942,3077 = .057,6923; 

and though this probability is not quite significant, the arrangement observed 
has the maximum number of groups of empty and occupied seats for the m and 
n of the size observed since no two occupied seats are adjacent. However, if 
another customer had entered and sat either in the 5th empty seat from the 
Left or in the 8th empty seat, the number of groups would have been increased 
by two and the situation would be: 


fra = 6, 
n = 10, 
u' = 13, 

P = 1 - 989,5105 = .010,4895. 

This P 'value is significant, and for this assumed case, as well as for the actual 
case observed, the arrangement of E's and O' s has the maximum number of 
groups of like objects. Certainly both of these cases exhibit too many groups 
to be considered random arrangements. 

The use of these tables to test whether two samples constitute independent 
random samples from the same population 4 can be illustrated by using the data 
of Snedecor s Exam ple 4.11 on page 75 of his Statistical Methods (3d edition) 

and 'V! 0lf0W1 ^ (ibid) have P° inted ^at exceptionally Email values of 
.« are to be regarded ae evidence for rejecting this null hypothesis. 
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which gives daily gains in two lots of steer calves on two different rations. The 
daily rates of gain given for the two lots are: 

I. 1.95, 2.17, 2.06, 2.11, 2.24, 2.52, 2.04,1.95; 

V. 1.82,1.85,1.87,1.74, 2.04,1.78,1.76,1.86. 

Arranging these rates in order of magnitude, designating a calf on ration I by 
italics and one from V by ( ), we have 

(1.74), (1.76), (1.78), (1.82), (1.85), (1.86), (1.87), 1.95,1.95 , 

(2,04), 2.04, S.06 , 2.11, 2.17, 2.24, 

Whence 


■th =* 8 , 
n = 8, 
u' = 4, 

P « .008,8578. 

Accordingly, at either the .05 or .01 level of significance rejection of the null 
hypothesis that the two samples constitute independent random samples from 
the same population is indicated. 

For these data we note the fact that having two identical values, i.e. 2.04, in 
the two lots did not alter the number of groups regardless of whether they were 
recorded as (2.04), 2 .04 or as 2.04> (2.04). However, such duplications in general 
may be more bothersome, since they may yield different values of v! depending 
on the order in which they are considered. In such instances both possible 
orders should be considered. 

The merit of this test is that it employs a minimum of assumptions—merely 
that the common population be continuous, and that the samples be drawn at 
random independently. Its principal defect is its lack of power. As a conse¬ 
quence gross disparity between the samples is generally required to render 
u> ^ u < ■ Therefore, when additional assumptions are tenable, tests utilizing 
them should be employed. 

Most of the computing and checking of these tables was done by Frieda S. 
Swed, Philip Ritz and Beatrice E. Kelley with some assistance from Jay Grod- 
man, Edward Halamka and Mrs, Henry Wallman. Also, Duane Borst and 
Francis Cox helped with the typing and the proofing of the tables. 



TABLE I 
P |u < u'J 

When m = n, the largest possible value of u' is 2m, when m < n, the largest possible value of u' is 2m + 1 
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TABIiIJ I (Concluded) 
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NOTES 

This section is deeded to brief research and expo sit iry articles, and notes on 
methodology 


A NOTE ON TFB BEST LINEAR ESTIMATE 

By Allen T. Ciiaig 
University of Iowa 

1. Introduction. Let the chance variable x be subject to llie distribution 

function D( i) and as usual let 7?[fir(:r)] denote the mathematical expectation of 
the function g (. 1 ) If .0 , , • • r„ constitute a sample of n independent values 

of .r, the function y = r,x, + r. s + ■ ■ ■ + c„.i„ is frequently called the best 
linear estimate of E(, 1 ) when tin its an 1 so chosen that E(y) = E(i ), and 7i’[i/ — 
E(.r)f = al is a minimum. It is'the puipo.se of this note to give an example of 
an estimate y, best, in the sense defined, yet such that, if being another estimate, 

Pr[E{x) - 5 < y < E(x) + 5] < Pr[E(x) - 5 < y' < E(r) + 5], 
for every & > 0. 

T 

2. The rectangular distribute 1. Consider D(x) = 1/a, 0 < .r < a, and let 

the n items of each sample be arnnged m ascending order of magnitude m, that 
L < 12 < < -i’ll, n > 2. Tie generating function 0(1) of the moments rjf 

the distribution of y = cyx 1 + + • -f c„.i „ 1 - 

' G(t) = E(e lv ) = ~ ■ fV (r "' + ■ + ^»> dx t d.r 2 ■ rh„. 

Cl Jo Jo Jo J 0 

Thus 


E(ll) m — /“p— t Cl "f 2 c 2 + 3c3 + * • + ?ic„] 
. fl *r -L 


and 


E(f) = G"( 0) = 


(n + 155T+T) [1 ' 2c ' + 2 ' 3c = + ■ ■ • + 


+ 211 -3ci C 2 + 1 4ciC3 + ■ ■ • + 1' (n -f- l)c] c„ 
+ 2'4c 2 c 3 + • ■ ■ + 2(a + l)c 2 c,, 


From Ely) = £(.,) = a/2, wo’have 


+ (n - l)(n + l)c n _ic„j]. 


Ci — 5(11 d 1) — 2 c 2 — 
88 


~ nc„ . 



JifbT LIM.AU LSmillL 


8 l J 


Thus cl = G"{ 0) — a/4 with r L m G"[ 0) lephu-ed bv \(nl) — 2r< — ■ ■ ■ 

. 9 

0, j = 2, 3, ■, n, we olitam the follow mu -y-iom ol n — 


From 

9c, 


oc, 

homogeneous linear equation-* m n — 1 unknown-, 

4c 2 + Gc 3 + • • • + 2 nc n = n + ] 

Ocs + 12c, + ■ • ■ + 4«c„ = 2(n + I ) 

Sc, + lfir,) + ■ ■ ■ + One,, = 3 (n 4- I) 


- tie,.. 
11011- 


2nc, -f- 4nc$ + • • ■ -+- 2«(n — l)c„ = (n — l)(n + 1). 


Since the determinant of the coefficient-, i.- not zmo, tlie solution r>~r.i= - • • =’ 
Cn-i = 0, c„ = (n + l)/2n, is unique. Further, we w that c t = 0 ->o the lie-l 
linear estimate of the mean of the icctanguku population i - y = (n 4- lj.i n 2a, 
where r„ is the hugest item in the -ample 
The distribution function ol y i- readily hmnd to bo 


B{y) = n 


2>i 

|_a(» + 


“1 u 

- i/" -1 

1)J J * 


0 < ?/ < 


2u 


From this, it follows that cl — - -- — . 

’ v 4u(n + 2) 

ft has long been known 1 that the sampling distribution of the -tad-tie 
w = j(,i] + x„), where ri and r„ are respectively the smallest and huge-t items 
in samples of size n from a rectangulai population, has a -mallei \iirianee than 
docs that of the arithmetic mean ,f of all n item- The di-lribiilion function 
of Ol is 


D(oi) - 


2 n-l n 

?lCi} 


a " 


2 W—I 

n t „ \n-i 

~ -4F- ~ «) . 


0 < w < '.a, 


\a < w < a, 


so that E(oi) - Jaanduo, — 2f^r+~i)7IT+2)' ^ u,i! a '' ~ “PP^uiuitelv. 

Yet Pittman has recently proved that for every 5 > 0, Pv[E(. i) - 6 < oi < 
E(z) + 5] exceeds the piobabihtv that anv other estimate, including //, will fall 
m tins interval of length 25 about the mean a/2. 

If we write u = ^-- and v = —— a , then the limit- ol f)(n) and D(r) 

Oy C u V ' 

as n approaches infinity me respect ivelv c“~ l , — « < u < 1, and 1 + v ""i 

V2 ’ 


1 Tt V Fi.-ltci, '‘Theoretical foundations of mathematical statistic-,” Phil. Tunis, 
Soc. London, Senes A, Vot 222 (1921), pp. 309-363 


Poi/, 




90 


EDWARD PAULSON 


— co < v < oo. Thus neither y nor u has an asymptotic normal distribution. 
It is, of course, this fact which makes the criterion of minimum variance illusory. 


3. Other polynomial distribution functions. Let repeated samples of n in¬ 
dependent values of x be drawn fiom a population characterized by D(x) = 

k 1 ic 

—j+ j- x , 0 < x < a, and k a positive integer or zero. It can be shown that the 


best linear estimate of the mean of the population is y = —- iK— -- "j" - x„ 

n(k + 2) ’ 

where as before x n is the largest item of the sample. The sampling distribution 
of y is easily obtained. It follows that 


2 

a v = 


(k + I)a 2 

(k + 2 y[(k + l)n 2 + 2»] 


* + 3 


2 

. y 


n(k ~f~ 1) -f- 2 

where as usual x is the arithmetic mean of the sample. Again, if we write 
u = (v “ “)/ °y > the of the distribution of u as n approaches infinity 

is, as before, e” -1 , — » < u < 1. 


A NOTE ON TOLERANCE LIMITS 


By Edward Paulson 1 
Columbia University 


Among various statistical problems arising in the process of controlling quality 
in mass production, a rather important one appears to be the determination of 
tolerance limits when the variability of the product is known to be due to ran¬ 
dom factors. This problem was recently treated m a pioneer article by Wilks. 
This note will point out a relationship between tolerance limits and confidence 
limits (used in the sense of Neyman), and will use this concept to establish 
tolerance limits when the product is described by two qualities, the measure¬ 
ments on which are assumed to have a bivariate normal distribution 
For the case of a single variate, the problem of finding tolerance limits as 
stated by Wilks is to find a sample size n, and two functions L x { Xl ■ ■ ■ x„) and 

■ ■ X,) so that if P = f(x) dx denotes the conditional probability of 
a future observation falling between the random variates L 2 and U , then 


E{P) - a, and Prob. [« - A, < p < « + Aj ] > p 

"7 h : P bet r Cn C ° nfidence limits aud tolerance limits will arise if 
confidence hunts ar e determined, not for a parameter of the distribution, but for 


1 Work done under a grant-in-aid from the Carnegie Corporate of New York. 
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a future random observation (or for some function of the observations in a future 
independent sample). This is based on the following simple lemma' If confidence 
limits U\{xi ■■•!„) and Ut(xi ■ ■ ■ x„) on a'probability level = a 0 arc determined for 

g, a function of a future sample of k observations, and P = I 'pig) dg, then E(P) = 

J U 1 

«o For let \p(g) dg and <p{U \, Uf) dUi dU* denote the distribution of g and U 1 , lh 
respectively, then by the definition of expected value 

e(p) = r r r r ^ ^, u 2 ) du^u,. 

( j— OO J—OO L J */l J 


This triple integral is however exactly the probability that g will lie between 
Ui and Ui , which by the nature of confidence limits must equal a 0 , which proves 
the lemma In a similar manner it follows that if on the basis of a given sample 
an l dimensional confidence region is found for statistics 0 i, 02 , • * ■ gi derived 
from a future sample, and if P denotes the probability that gi ■ ■ ■ gi all fall in 
the confidence region, then E(P) in repeated sampling equals « To establish 
tolerance limits, it is necessary in addition to E(P) to also know the distribution 
of P, or at least o>, so the distribution of P can be approximated. 

It appears, at least on an intuitive basis, that the "best” confidence interval 
can be used to determine the shape of the “most efficient” tolerance limits; this 
intuitive notion will gam additional support from the character of the. tolerance 
region which will now be derived for an observation ( x, y) from a distribution 
with probability density f{x, y), where 


fix, y) = 


exp< - 


2(1 - p 2 ) 


^ - mf j _ 2p ^ x - + ( fj - mfj~ 


2ir<r x <ry ■y/l ~ p 2 


Suppose we have 2 independent samples 


[(zi, yi)(x 2 , yf) ■ ■ ■ (x„, j/„)] and [(x, y)} 
both from fix , y) Then it is known that 

n n 

where 3 — xjn, 5 “ = 2 ( x i ~ xf/fn — 1), etc , has the distribution of 

i-i 1 

Hotelling’s Generalized Student Ratio [2], A confidence region for a future 
observation (x, y) on the basis of a sample of n on a level of significance - a will 
be given by the elliptic region T 2 < T\ (in the x, y plane), where T\ ~ 2 (n - 1) 
F a /(n — 2), where F a is the value of the F distribution (with n L = 2 and nj = 
n — 2 degrees of freedom) which is exceeded with probability = 1 - a. 

If P denotes the probability of a future observation falling in this ellipse, then 

F = JJ fix, y) dx dy. By utilizing the fact [2] that T 2 is invariant under linear 

r'sri 



M2 


1.DWVI1D I’ULsO.N 


transformation.'', it i- not difficult to son that tlic* distribution of P will not in- 
vohc any unknown par.imetoi•>, so its distribution can he calculated under the 
assumption m, = m,, = p - 0, a z = <r„ = 1- Then 


P = F(r, = if J^ c ' 


i* 2 


1 

. /of 


-! i/ ! 


d" d;/. 


1 5 C J 1 


Wo know that. AYP) = «, and wo will now calrulalo the Mirianee of P hv e\- 
panclnip; P m ;i Tawlor Sene-do lernis of (lie (list order) ahout tin* point J = 0, 
i] — 0, r = 0, .s 2 — 1, p = 1 P can cloudy he pul in the form 


P 


-If 

2 tt J 


2,r •V-pV"’ 11 -, 


e eh I " ,J 




dy 


Taking derivatives and evaluating ahold the* population values 



2 

— 



Smeo for ordinary values of «(« - .95 or .09) the distribution of P ■ 
3’tW VCry Sl °" ly ’ "’ C ml1 foll,uv 11 ^'KKoMiou of Whiles ; 

1 osc that, a iairly close approximation to the distribution of P will he f 

(1) p-»/ i _ pw-i 


mi. ui i 




r (n_-i- y) j j 

r(w)r(y) x u ^ > 


md sup- 
fpven by 
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where u = [a 2 (l — a) — ao>]/o> 

v = [a(l — a) 2 — (1 — ct)(Tr]/ap. 

This distribution ran now be used to establish tolerance limits. For example, 
it follows from (1) that for a sample size n > 214, and a tolerance legion (riven 
by the ellipse T~ = 9.21, then E(P) — 99 and the Prob.| 985 < P < /jor,} > 
992. 

Caie must be taken m the use of these and similar insults, for if the distribu¬ 
tion is not a bivaiiate noimal one, a large error may be mlioduecd which will 
not be eliminated with increasing n; howevci the erior will piobably be small 
when a tolerance region is found for the means ,f, y of a futuie sample of k obser¬ 
vations (k > 20) as contrasted with a toleianee region for a single observation. 
An exact treatment of the case when the bivariate distribution is unknown has 
been given by Wald in the present issue of the A?inals of Mathematical Statistics. 
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A NEW APPROXIMATION TO THE LEVELS OF SIGNIFICANCE 
OF THE CHI-SQUARE DISTRIBUTION. 

By Leo A. Aroian 
Hunter College 

Recent aiticles on the percentage points of the x 2 distribution [1], [2], have 
diiected my attention to a method pioposed m my investigation of Fisher’s z 
distribution [3], a method particulaily useful and casilv computed foi n large 

In addition, this method avoids interpolation If t = — 


7==-1 n,nd a 3 


1/1 


a/ 2 n 

the measure of skewness for the x 2 distribution, the following formulas give sig¬ 
nificance levels of t as quadratic functions of « a , t = a + h« 3 -f raj. The values 
of a, b, and c were found by the usual method of least squares, lit ting each formula 
to the values of i [4] for a 3 = 0, ±0 I, ±0 2, ±0.3, and ±0.1 Then Ihe \ a lue 
of a in each instance was adjusted to give the proper \ nine of t when = p. ,, p, 
the constant term by the method of least sqyares for (lie 1 per emit paint is 
2.32633 which we change to 2 32635 The range | a. ( | g ,4 corresponds ton i> 
50, but the foimulas are quite satisfactory for n S 30. Formulas for l when 
| I > .4 [3] are easily denverl, but such results while more accurate in the range 
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30 n < 50 would be consideiably less accurate in the region n & 50. After l 
is calculated, % = n + y/2nt The formulas are: 


f&O'fo 

= 

— 

,16636a, 


Iw, 

= 

.25335 - 

.15567aj 

- .012270a, 


= 

.52440 - 

,12058a. 

- .021511 a. 

^25% 

= 

.67449 - 

.090013a, 

- .030693a, 

tso% 

= 

.84102 - 

.048433a, 

— .036788a, 

tio% 

= 

1.28155 + 

. 107033a, 

- .04797a, 


= 

1.64485 + 

.28392a, 

- .04902a’ 

h 5% 

= 

1.95997 + 

47228a, 

- .0430-1a,' 


= 

2 32635 + 

.73330a, 

- .024957a, 


= 

2.5758 + 

.93600a, 

— ,00377a. 

1 1% 

= 

3.0903 + 1.4190a, 

+ , 05067aii 


f 01% = 3.7200 + 2.1260«3 + .17U9ar» 

The maximum error for t in the range '| « 3 1 I .4, is 2 in the fourth significant 
figure, 1 m the fourth significant figure, 6 in fifth, 3 in fifth, 3 in fifth, 1 in fifth, 

1 in fifth, 3 in fifth, 4 in fifth, 4 in fifth, 4 in fifth and 4 in fourth significant figures 
respectively for the .01%, .1%, 5%, 1%, 2.5%, 5%, 10%, 20%, 25%, 30%, 
40%, and 50% points respectively The error increases outside the indicated 
range. In addition 

(2) t u 03 % = -3.7200 -1- 2.1200a, - .174 Wail 

l Me% = -3.0903 + 1.4190a, - .05067a, 

and similarly for other percentage points. These are obtained from (1) by re¬ 
placing a, by —as and l by —t. 

We compare results obtained by these methods against those of Wilson and 
Hilferty [2], In all cases except at the 95% level the method here proposed is 
superior. Table I compares the two methods. It. was copied from [2] except 
for the corrections in the Wilson and Hilferty method for the 95% level and in 
the accurate value for x at the 5% level for n ~ 75, 96.2160 in place of (Hi. 11. 
Table II gives comparisons for other levels when n = 30 
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NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of 

general interest 

Personal Items 

Associate Professor H. P. Evans of the Mathematics Department of the Uni¬ 
versity of Wisconsin has been promoted to a professorship. 

Assistant Professor Willy Feller of the Mathematics Department of Brown 
University has been promoted to an associate professorship. 

Dr. Carl F. Kossack of the Mathematics Department of the University of 
Oregon has been promoted to an assistant professorship. 

Dr. Eugene Lukacs has been appointed to an assistant professorship in the 
Mathematics Department of Illinois College 

Professor E. B. Mode has been made chairman of the Mathematics Depart¬ 
ment of Boston University. 

Mr. Charles R. Mummery has been made Product Quality Engineer at the 
Scioto Ordinance Plant of the U. S. Rubber Company. 

Professor H. L. Rietz has retired after twenty-five years of service as Head of 
the Mathematics Department of the University of Iowa. 


The Foundation for the Study of Cycles has announced that a medal will be 
awarded to the individual making the most significant contribution to cycle re¬ 
search during 1943. Communications should be addressed to: Professor Ells¬ 
worth Huntington, Yale University, New Haven, Connecticut. 


Obituary 

Professor Edward L. Dodd of the Mathematics Department of the University 
of Texas died on January 9, 1943 at the age of sixty-seven years, He was a 
charter member of the Institute. He was elected as one of the Vice-Presidents of 
the Institute for 1943. His contributions to mathematical statistics consist of 
numerous research papers on probability, on general mean functions of statistical 
variables, and on statistical theory of periodicities. 


Stanfdrd Courses in Statistical Methods of Quality Control 

P , r °? dUre “ adult education > and particularly in statistical education, 
I 6 i a lTT a \ Stanford University, when courses in the Shewhart 
Thprp e 0 s o quality control were offered in short intensive courses. 

and h^r 0 T"? ° n f ° n the Campus at Stanford University, July 17-26, 
fuU davs° and S “ The <*urse covered ten 

days idSmVaSdudrf MU "“ “ gh * ho ” S per d! *' S “ to - 
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The features of the course may be described by the following points: 

1. The courses were short, thus making it possible for men in industry to 
attend 

2. The number of hours’ instruction was sufficient tp cover the field adequately. 

3. The instruction covered n wide range of points of view. 

4. The students were picked delegates sent by industry. 

5. The courses are being followed up with monthly meeting.', in Las Angeles and 
San Francisco. 

Intensive courses of this character were first suggested by Dr. W. Edwards 
Demmg in April of 1942, r\ hile he w as temporarily detailed to the office of the 
Chief of Ordnance in the Wav Department, and the first course actually com¬ 
menced just three months later. By giving the course to men already in In¬ 
dustry, the yield obtained was manyfold higher than can be expected from a 
regular college course. Contributions and reports’madc by the delegates sub¬ 
sequent to the courses supply abundant foundation for this statement. 

West Coast industry and the Army and Navy ordnance districts sent 32 dele¬ 
gates to the West Coast, and 31 to the second. Through the efforts of Professor 
Eugene Grant of Stanford, industry and the Army and Navy were persuaded to 
send some of their most valued officials. The instruction was organized by 
Professor Holbrook Working. Both he and Professor Giant took an active part 
in the instruction, which was supplemented in both courses by Dr. W. Edwards 
Deming as an exponent of government, and industrial sampling. In the first 
course, Mr. C’harles R Mummery of the Hoover Company served as an in¬ 
structor irom the viewpoint of industry. In the second course (the one in Los 
Angeles’), Mr, Ralph E, Waicham of the General Electric Company occupied 1 lie 
industrial comer of the square of instruction. The expense of the instructors 
was paid out of ESMWT funds (Office of Education). Mouldy follow -up courses 
in San Francisco and Los Angeles, under the direction of Professors Working and 
Grant, supply the necessary pou or for maintaining momentum, and for gather iug 
the men together for directed study and consultation. 

The demand for men>rained in this line far exceeds the supply, 
movements atoot to piovide similar courses in t> number of industrial cities. 
Thiee-dav courses in a dozen or more key ordnance ci%s were held last fall by 
the Ordnance Department The lecturers were Messrs. G. D. Edwards and 
Harold F Dodge of the Bell Telephone Laboratories, and Mr. G. Rupert (lausr tif 
the Aberdeen Proving Ground, now with the Army Ordnance in Washington, 
These courses and the Stanford courses alleviated the situation considerably, lml 
fuvther instruction is needed 


Junior Membership in the Institute 

At the annual election for 1942 which was held by mail’ballot because of the 
postponement of the Annual Meeting, constitutional amendments were approved 
ubiich created a new grade of membeiship in the Institute, known as Junior 
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Membership It is hoped that this provision for Junior Membership will 
stimulate interest in mathematical statistics at the advanced undergraduate 
level in colleges and universities. 

The Board of Directors have approved the following rules governing Junior 
Membership: 

1. Any undergraduate student of a collegiate institution is eligible for election 
as a Junior member of the Institute of Mathematical Statistics provided 
that he or she is sponsored by a member of the Institute 

2. The annual dues ($2:50) must be submited with the application 

3. Annual membership shall coincide with the calendar year and the Junior 
Member shall receive a complete volume of the Annals of Mathematical 
Statistics for the year in which he or she is elected. 

4. Junior Membership shall be limited to a term of two years, but a Junior 
Member may apply for transfer to ordinary membership at the beginning 
of his second year. 

For the convenience of any Institute member who may wish to sponsor a 
Junior Member an application blank is provided at the back of this issue of the 
Annals. Additional blanks may be obtained from the Secretary of the Insti¬ 
tute. 


Announcement of May Meeting in New York 

There will be a joint meeting between the Institute and the American Society 
of Mechanical Engineers on Saturday, May 29,1943, at the Engineering vSocietics 
Building, 29 West 39th Street, N. Y. 

The meeting will consist of two sessions on industrial applications of mathe¬ 
matical statistics. The'topics are as follows: 

Morning Session, 10 A..M. 

Chairman: Harold Hotelling 

1 J. Wolfowitz, On the,Theory of Runs with some A pplications to Quality Control. 

2 Churchill Eisenhart, On the Presentation of Data as Evidence. 

Afternoon Session, 2 P.M. - - 

Chairman; W. A. Shewhart 

1. H. F. Dodge, A Sampling Inspection Plan for Continuous Production. 

2. L. C. Young, Tolerances and Product Acceptability. 


ANNUAL REPORT OF THE PRESIDENT OF THE INSTITUTE 

Ordinarily at the business meeting and at the luncheon customarily hold as 

T^'° f theIn u stltute -President has the opportunity to 

i t0 th0Se ^ div ^als, aside from the officers, who 

served the Institute during the year, and to have his say concerning past progress 

and future pkns , This year it appears that the pages, of-the £Lu must be 
used for this purpose. .. ' . . 



REPORT OF PRESIDENT 


90 


As the ic,suit of pioposals made and approved at our last regular annual meet¬ 
ing in New York City, alargci number of special committees than usual were ap¬ 
pointed tor 19-12 and thus more members than before were specifically asked to 
participate in the affairs of the Institute. The Institute is much indebted to 
these individuals for the way in which they responded. 

Tiofcssors A T. Oiaig, Harold blotching, and ft. ft, Wilks, C’liaiiman, consti¬ 
tuted a committee to study the Board of Directing of the Institute, and the 
formal connection between the Institute and its journal, The A ninth of Mutlw- 
matical Statistics. Their recommendations were mcoiporatud in amendments 
to the constitution and by-laws leeently approved by the Institute, The Board 
was increased in size and given greater continuity hy including in it the two 
previous presidents, and the editor of the Annals , cx officio, and hy increasing 
the term of the Secretary-Treasurer to three years. 

The new class of junior memberships is the result of a study of this question hy 
a committee composed of Professors J If. Btishey, Bovd Harshbargcr, and G, W 
Sncdccor, Chairman. Regulations, since approved by the Board, under which 
local chapters of the Institute mav he funned, wore diaun up by a committee 
consisting of Dr 0, F Kossaek, Professor II. I). Larsen, and Profewn B. II 
C'amp, Chairman. 

Dr. L. A. Aroian, I)r J. F Daly, Mr II. F. Dodge, and Professor W. D. Buluu, 
Chan man, as a committee agreed to assist the lo-IP'ifo hv eml, .weing !«> Ining 
the Annals of Mathematical Statistics to the i.imu il/le .iiloHion of municipal, 
industrial, and college libraries which had not been subscribers to it As a result 
of their fine woik, a good number of domestic libraries has been added to our sub¬ 
scription list, thus saving to counteibalance our losses abroad. 

The Program Committees for the vear consisted of Professor Churchill 
Eiscnhart and Mr R 0. Molina, Cliaii man, for the Scptembei meeting in Pough¬ 
keepsie, New York, and of Profcssoi P S Dwyer and Dr. W. hi. Doming, chair¬ 
man, for the projected Cleveland meeting. The Institute is always much in¬ 
debted to those who do the v oi k of arranging its programs for mootings, but this 
vear we owe Dr Domingo Special acknowledgement, who prepared an excoUffiafe.!.* 
progmm lor Cdevelanttylhen one for a New York meeting under extremely short 
notice when the Cleveland meeting was cancelled,’and then had that meeting also 
cancelled. Dr, W. R. Van Voorhis acted as our representative on the Committee 
on Local Airangmnents for the meeting planned for Cleveland. 

The membership committee appointed for 1942 was made up of Dr. W. K. 
Doming, Professor E L Dodd, and Professor A. T. Craig, chairman After 
Professor Craig took up Ins commission in the Navy, Professor B. fl ('amp 
agreed to take his place on this committee. 

For some years Professor A. T. Craig has generously acted as custodian of our 
files of back numbers of the Annals This service to the Institute has been 
tfaken over by Piofessoi L A Knowler, who has been of much assistance Dr 
W. E Blanche did a consideiable amount of work in connection with find! , 
advertisers for the'Annals. *" 1 

\ 
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Our annual meetings had been increasingly successful in recent years and rt 
u as a ieal sacrifice for the Institute t,o torego the one planned f 01 1942. g 

the war is demonstrating in still more ways and placet, the importance of sound 
statistical methods, for the present it imposes senous responsibilities on tic 
friends of the Institute. A reading of the reporj of our faithful and efficient 
Secretary-Treasurer will amplify this statement. Of the present Board, Pro¬ 
fessors Olds, Wilks and Craig met in Pittsburgh January 23 and 24 to considci 
some of our problems. Though there seems no prospect of a national meeting m 
the coming year, it is hoped that some local meetings can be held and that in 
othei ways we can keep up the activities of the Institute In particular there 
exists the opportunity of oiganizing local chapters of the Institute in the larger 
centeis rvhieh would be particularly valuable now In industrial areas we may 
contribute to the war effort as well as promote an important aspect ol mathe¬ 
matical statistics by endeavoring to be useful in the development and application 
of industrial statistics. It is clear that the Institute needs the loyal support of 
its membership now as much as ever before if it is to fulfill the functions for which 
it was founded. 

Cecil C. Craig, 

President. 

December 31, 1942 


ANNUAL REPORT OF THE SECRETARY-TREASURER OF THE 

INSTITUTE 

On September 8-9 the Institute met at Vassal College in conjunction with the 
American Mathematical Society and the Mathematical Association of America. 
Mr. E. C. Molina and Professor Churchill Eisenhart were in charge of the pro¬ 
gram Fifty eight members of the Institute attended the meeting. The Annual 
Meeting, originally scheduled for Cleveland then transferred to New York City, 
w as finally postponed at the request ol the Office of Defense Transportation At 
the pic,sent time it seems that this meeting will have to be abandoned entirely 
and the Institute must be content with'holding local meetings’in some of the 
larger cities. 

Because of the postponement of the Annual Meeting, the annual election was 
held by mail. The following officers were elected: Professor Cecil O. Ciaig, 
President, Professors Edward : L.’ Dodd and Abraham Wald, Vice-Presidents; 
and Professor Edwin CL Olds, Secretin y-Troa surer Nine amendments to the 
Constitution and six amendments to the By-Law s w ere proposed and accepted by 
a two-thirds majority of those 1 voting. Professor !v. L. Fetters acted as teller. 

During the past year the Secretary has cooperated w itli industrial concerns and 
government agencies in locating statistically trained personnel to till positions, 
created by the emergency. Members of the Institute are requested to keep the 
Societaly informed regarding the availability of such personnel. - 

The death of one member of the Institute lias beerr-reported since the 
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annual meeting—Dr. Robeit Henderson, former Vice-President and Actuary of 
the Equitable Life Assurance Society 

The following financial statement covers t'ha period from December 10,1941 to 
December 10,1942 (the books and records of< the Treasurer have been audited by 
Mr. George E Niver and found to be in agreement with the statement as sub¬ 
mitted) : 

FINANCIAL STATEMENT 
December 10, 1941 to December 10, 1942 


RECEIPTS ; 

Balance on Hand, December 10, 1941 . $1,56154 

Dues .. . . . . .. . 2,203.15 

Subscriptions ... . . . . 1,289.74 

Sales of Back Numbers , . . .. 1,393 65 

Cumulative Index . . . ... 5 00 

Miscellaneous . . . 28 50 


Total Receipts . . $6,571 68 

Expenditures 

I 

Annals Office 1 

Editorial Expenses. .... . $127 98 

Waverly Press 

Printing and Mailing Annals—4 issues . ... . , 3,227.34 

Back Numbers Office < 

Purchase of bach numbers from II C. Carver . $355 77 

Reprinting 300 copies of Vol V, No 4. • 142.16 

- $407.03 

Library Committee . . , .. 10 77 

Secret Any-T reasurer's Office 

Printing and Supplies . . $53.58 

Binding , , ., .. 30,00 

Postage , . ... .. 139.16 

Clerical Help ■ • . . .1 . 

1 _ -$438 24 

Printing Programs for Meetings . , 101 11 

Miscellaneous ... ■. . .. ... 7 08 


Total Expenditures . . . $4,416 45 

Balance on Hand, December 10, 1942 ,, . \ . ,, . 2,155 13 


$6,571.58 

In comparison with the financial condition of the Institute at Abound of 1941, 
the receipts from dues, subscriptions, and sales of back numbers have increased 
inore than $800. This is mostly due to a large increase in the sales of back num¬ 
bers and a net increase of fifty members The increase m expenditures of the 
Institute was accounted for by theTncreoscd cost in printing the Annals. This 
ma-rks the beginning of a trend which seems likely to continue throughout the war. 







102 


INSTITUTE OF -MATHEMATICAL STATISTICS 


It Mould seem over-optimistic to- expect that the financial situation of the 
Institute would continue to show marked improvement m 11143 Present indi¬ 
cations suggest that we shall be verjf fortunate to avoid a considerable deficit in 
operations. A large number of our foreign .subscribers have no! renewed and we 
face considerable difficulty in delivery of the Annals to those still m lurce The 
large increase in the sales of back numbers was due to a rather mccosslul eifort to 
persuade domestic libraries to provide themselves with complete sets of back 
numbers while the issues were still available. The Institute faces an increase in 
operating expenses and ah advance in the cost of producing the Annals The 
full cooperation of all members is needed if we me to avoid a decrease m the work 
of the Institute during 11)13. 

Edwin G Olds, 
Secrdary-Traasw cr. 

December 31, 1942. i 


On behalf of the Board of Directors of the Institute, I regret to announce the 
sudden death of Vice-President E. L. Dodd, on January 9, 1943, shortly after 
this report was written. Dr, W. E. Doming was appointed by the Board of 
Directors to fill the vacancy created by Vice-President Dodd’s death. 

H. G. 0 


CONSTITUTION 

OF THE 

INSTITUTE OF MATHEMATICAL STATISTICS 
ARTICLE I 
Name and Purpose 

1. This organization shall lie known as the Institute of Mathematical Statistics 
2 Its object shall be to promote the interests of mathematical statistics 

ARTICLE II -- 

Memhetiship 

1 The membei ship of tfce Institute shall consist of Members, Jumoi Members, Mil lows,. 
Honorary Members,and Sustaining Membeis. 

2, Voting members of the Institute shall lie (a) the Follows, and (b) all others, Junior 
■ MeinberR excepted, trim have been members for twentv-three months prior to tin* date of 

3 No pci son shall be a Junior Member of the Institute for more than a limited term as 
determined by the Committee On Membership and nppuned by the Board of Directors, 

. ARTICLE III 

Officers, Board of numerous, and Committee os Mdmdeesiiii' 

1. The Officers of the Institute shall he a P.esulent, two Vice-Presidents, and a Beere- 
tary-Trcnsm or, , Thf terms of office of the President and Vice-Pi esidents shall bonne‘year 



and that of the Secretavy-Tmism er tin po years. Election- "hull !«■ Iiv majority Lull'd >* 
Annual Meetings of the Institute. Voting tray lip m prion nr hy mail. 

(a) Exception. The fust group of Oflirin- -1 >•’11 lie circled hy a majority witi-nf r 
divuluals present at the organization meeting, and shall .<rve until Deeiiula r .11. I 1 *! 1 ' 

2 The Board of I )i ice tors of the Institute sluil run -1 > of the Officer*, the t«'< pn i, '■! 
Piesidents, and tire Editoi of tlie Official Journal of the In-fiUile. 

3. Tlie Institute shall have a Committee on Mcmliei-hip roll ijm set 1 oi lb'- 1 1 
At their first meeting subsequent to the adoption of thi* C"•■nstitntinn, (he !'• od ' 
rectors shall elect three members as Fellows to serve itn the f 'mminlli i on M* w ■ 'm 
one member of the Committee foi a term of one year, another foi a t< tin oj (i<o xr e 
and another foi a term of three yearn Then alter the Hoard of Director 1 * shall chit it -m 
among the Fellows one member'annually at their first meeting after their cler lam for i 
term of three years The president shall designate one of the Vice-Presidents ait ChfuftnM 
of this Committee 

ARTICLE IV 
Mkhtjvoh 

1 A meeting foi the picsentation and ih-cr sum, of paler-. fm the rlc. l.nn».)«■ < * t 
and for the tiansaction of other Inihine.ss of the Institute sli.il! he held aim.i d'" P ■ h 
time as the Board of Directors may designate. Additional m< clings may he - .11* i s 
time to time by the Board of Directors and shall hie railed tit any lime liv the l'r« oh 1 * 
upon written request from ten Fellows. Notice oij the time and plnec of funding dull >r 
given to the membership by the See retary-Treaau re ir at least, thirty days prior to I he date 
set for the meeting. All meetings except executive' sessions shall be open tn the pubfir 
Only papers accepted by a Program Committee appointed by the President rtmv Is- yn 
sented to the Institute. ' 

2. The Board of Directors shall hold a meeting ir^inierliately after then elf. tern ,e ! 

again immediately before the expiration of their term. Other meetings of the Hoard i<, <\ 
be held from time to time at the call of the Prc ulent or any two membri t of the lh-nr.1 
Notice of each meeting of the Board, other than the two ’regular meeting*-, log, <1 u *(, , 
statement of the business to be hi ought before the meethig, must be given »<< I in it,. , 
of the Board by tlie Seeretaiy-Tieasurer at least five d;aya prior to the dale ret theirf-" 
Should other business be passed upon, any member of 'the Hoard wind I have th* rrgb* l» 
reopen the question at thcjiflXt meeting, ' • . „„ M 

3. Tlie Committee'dn Membership shall hold a meeting immediately after the mum si 
meeting of the Institute Further meetings of the Committee may lie In-Id fee it t,i, . 
timeat the call of the Chairman or any member of the Committee provided noli. i* „{ mk-Ii 
call und the purpose of the meeting is given to tlie mcjnlx-ra of the ComimUrr by the 
Secretary-Treasurer at leust five days before tlie dote net tliMefor, Shmdrl other S.u-u,r - 

, be passed upon, any membei of the Committee elmll hav e Llie right to reopen ih<- on, 
at the next meeting 

4. At a regulaily convened meeting of the Board yf Director*, four memli. ' i 

stitute a quorum At a regularly convened meeting of the Committee <m A^jwbw.fJ, 
two members shall constitute a quorum. 


ARTICLE \ 


PUBLICATIONS 

Thi' ™ ! Ani f S , 0f Malhematical Slatisli “ be the Official Journal for the Intern 

T Editor of the Annals of Mathematical Statistics shall fie a Fellow appointed t* J* 
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Bnaid of Dn eetora of the Institute. The term of office of the Editor may lie tcnnmatod at 
the disci etion ol the Hoard of Directors 

2 Otlier publications may lie originntdd by the Board of Directors as occasion arises 

ARTICLE VI 
Expulsion on Suspension 

1 Except for non-payment of dues, no one shall he expelled or suspended except by 
action of the Board of Directors with not move than one negative vote. 

ARTICLE VIT 
Amendments 

1 This constitution may be amended ’by an nffii math e two-thuds vote, at any regularly 
convened meeting of the Institute provided notice of such proposed amendment shall have 
been sent to each voting member liy the'Seeretary-Treasuier at least thirty days before the 
date of the meeting at «Inch the proposal is to be noted upon. Voting may be in poison or 
by mail. 


BY-LAWS 

ARTICLE I 

Duties or the Officers, the Editor, Board of Directors, and Committee on Mem- 

/ IlEUbHIP 

1. The President, or in his absence, mne of the Vice-Piesidents, or m the absence of the 
President and both Vice-Presidents, a fellow selected by vote of the Fellows present, shall 
preside at the meeting* of the Institute and of the Boaid of Dircotoi s. At meetings of the 
Institute, the pir-idma ofhcei .shall Vote only in the ease of a tic, hut at meetings of the 
Board ol Diicitom he may vote m. all ciisqs. At least three months before the date of the 
annual meeting, the 1 1 cadent she It appoint a Nominating Committee of three members 
It shall be the duty of the Nominating Committee to make nominations foi Officers to be 
elected at too annual meetiug and She Becietary-Treasuror shall notify ail voting members 

, at ! e <h s thirty days before the aif.nual meeting Additional nominations may be sub¬ 
mitted m writing, if signed by,at blast ten Fellows of the Institute, up to the time of the 
meeting. 

2. The Secretary-Treasurer shall keep a full and accurate reeoid of the proceedings at 
the meetings of the Institute and of tlip Board of Directors, send out calls for said meetings' 
and, with the approval of the President and the Board, carry on the correspondence of the 
Institute Subject to the direction of the Board, lie shall have Charge of the archives and 
other tangible and intangible property of the Institute, and onceayeai lie shall publish in 
the Annals of Mathematical Statistics a classified list of all Members and Fellows of the 

nshtute, He shall send out calls for annual dues and acknowledge receipt of same: pay 
all bills approved by the President dor expenditures authorized by the Board or the Instl- 
tute; keep a detailed account of all r-eceipts and expenditures, prcpai e a financial statement 
t ,■+ ’j ri u C0 fi ^ ear and Present an abstract of the same at the annual meeting of the 
Institute after it has been audited b y a Member or Fellow of the Inatuutt appointed by the 
President as Auditor. The Auditin' shall report to the-PresiaehtT 

.f: S . ubj “ t ‘".the direction of the Board, the Editor shall be charged with the respdnsi- 

„ 1 f d ! 1 a I I ! , u aUc ' r8 co ' Iloem,n ? the edlt > n S of the Annals of Mathematical Sta¬ 
tistics. He shall, With the adviw and consent of the Board, appoint an Editorial Conufdfc- 
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tee of not less than twelve members to co-operate with him; four lor a period of five year.-, 
four for a period of thiee years, and the remaining mornheis for a pound of two years, ap¬ 
pointments to be made annually as needed All appointments to the Editorial Com¬ 
mittee shall terminate with the appointment of a new Editor. The Editor shall serve as 
editorial atlvisei in the publication of all scientific monographs and pamphlets authm used 
by the Board. 

4. The Boaid of Dneetors shall have charge of the funds and of the aifaiis of the In¬ 
stitute, with the exception of those a flails specifically assigned to the Picsident or to the 
Committee on Membership. The Board .shall have authority to fill all vacancies ,d in¬ 
terim, oeeui ring among the Officers, Boat d of Directors, or in any of the Committees. • The 
Board may appoint such othei committees as may he required from time to time to carry 
on the affairs of the Institute. 

5, The Committee on Membership shall piepare and make available through the Secre¬ 
tary-Treasurer an announcement indicating the qualifications requisite for the different 
grades of membership. 

ARTICLE II 
Dues 

Members shall pay five dollars at the time of admission to mcmbeiship and shall receive 
the full current volume of the Official Journal Thereafter, Members shall pay five dol¬ 
lars annual dues The annual dues of Junior Membei s shall be two dollars and fifty cents. 

The annual dues of Fellows shall be five dollars The annual dues of Sustaining Members 

shall be fifty dollars Honorary Members shall be exempt fiom all dues. , 

(a) Exception. In the case that two Members of the Institute tire husband and wife 
and they elect to receive between them only one copy of the Official Journal, the annual 
dues of each shall be three dollars and seventy-five cents. 

2. Annual dues shall be payable on the first day of January of eacli year. 

3. The annual dues of a Fellow, Member, or Junior Member include a subscription to the 
Official Journal The annual dues of a Sustaining Member include two subscriptions to 
the Official Journal. 

4. It shall be the duty of the Secretaiy-Treasurei to notify by mail anyone whose duu, 
may be six months in arrears, and to accompany such notice by a copy of this Article. If 
■such person fail to pay such dues within three months from the date of mailing such notice, 
the Secretary-Treasurer shall report the delinquent one to the Board of Directors, by whom 
the person’s name may be stricken from the rolls and all privileges of membership with- 
diawn Such person may, however, be re-instated by the Board of Directors upon pay- ' 
ment of the arrears of dues. 


ARTICLE III 
Salaries 

1 The Institute shall not pay a salary to any Officer, Director, or member of any com¬ 
mittee 


ARTICLE IV 
Amendments 

1 These By-Laws may be amended in the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend¬ 
ment has been previously approved by the Board of Directors, 



Application fw Junior Membership 


Date. _ _ _ 

I am an undergraduate and hereby apply for Juuim Membership in the 
Institute of Mathematical Statistics 


Signature 

Name _ 


Mailing Address . 

College or University 
Sponsored by _ 


(I’lease pirnt) 


Applications and dues should be nuiilcd to 
EDWIN G. OLDS, Trcnstncr 

Carnegie Institute nf Technology 
Pittsburgh, Pennsylvania 
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ON TRANSFORMATIONS USED IN THE ANALYSIS OF VARIANCE 

By J. H. Curtiss 
Cornell University 

1. Introduction. Transformations of variates to render their distributions 
more tractable in various ways have long been used in statistics [12, chapter 
XVI] The present extensive use of the analysis of variance, particularly as 
applied to data denved from designs such as randomized blocks and Latin 
squares, has placed new emphasis on the usefulness of such transformations. 
In the more usual significance tests associated with the analysis of variance, it 
is assumed a priori that the plot yields aie statistically independent normally 
distributed variates which all have the same variance, but which have possibly 
different means The hypotheses to be tested are then concerned with relations 
among these means. But m practice, it sometimes seems appropriate to specify 
for each variate a distribution in which the variance depends functionally upon 
the mean; moreover, in such cases, the specification is generally not normal. 
For example, when the data is in the form of a series of counts or percentages, a 
Poisson exponential 01 binomial specification may seem in order, and the vari¬ 
ance of either of these distributions is functionally related to the mean of the 
distribution Before applying the usual normal theory to such data, it is 
clearly desirable to transform each variate so that normality and a stable vari¬ 
ance are achieved as nearly as possible 

Various transformations have been devised to do this, and a number of articles 
explaining the nature and use of these transformations have recently been 
published 1 However] the available literature on the subject appears to be 
mainly descriptive and non-mathematical. The object of this paper is to pro¬ 
vide a general mathematical theory (sections 2 and 3) for certain types of trans¬ 
formations now in use In the framework of this theory we shall discuss in 
particular the square loot and inverse sine transfoimations (section 4), and also 
several logarithmic tiansformations (section 4 and section 5). 

2. General theory. As it arises in the analysis of variance, the problem of 
stabilizing a variance functionally related to a mean may be stated as follows; 
Suppose X is a variate whose mean y = E{X) is a real variable with a range <S of 
possible values, and whose standard deviation <r = <r x = o-(g) is a function of g 
not identically constant Required, to find a function T = j(X) such that 
both f(X) and a\ = E{[T — E(T)] 2 } are functionally independent of for g 

on S (By “functionally independent,” we mean that — = 0 and = 0 
\ <h» dy 

for y on S.) 


1 See references [1], (2), [3), [4], [5], [6], [13], [16] 
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The following line of argument is adopted in certain of the references men¬ 
tioned above ([1], [2], [3], [4]): From the relation dT = f'{X)dX, we deduce as 
an approximation by some sort of summation process that a-f = f'(n)a(n). 
Setting this expression equal to a constant, say c, we obtain f'(y) = c/<r(y), 
so f(x) is an indefinite integral of c/a{x). The roughness of the approximation 
used here is only too apparent. 2 For example, if X is normally distributed, then 
the variance of T = X 2 as given by the approximation is 4<rV, while actually 
it is 4 ay + 2<r\ 

Indeed, it is easily seen that in important special cases the problem of sta¬ 
bilization as above stated could have no solution other than the trivial one m 
which T is identically constant on the set of points of increase of the d f, 3 of X. 
For instance, if X has a Poisson exponential distribution, then the identity 
E[{f(X) - E[f(X)}\ t ] = c, or E{[f(X)?} c + (£[/(X)]} 2 , becomes 

S^TT-' + tS'^Trl’ " >0 - 

Expanding both sides in powers of y, we need only equate the coefficients of the 
zero-th power of y on each side to find that [/(0)f = c + [/(0)f, which implies 
that c = 0 and hence that/(0) = /(1) = /(2) = • • A similar demonstration 
can be given for the case in which X has a binomial distribution with a fixed 
number of values of the variate. 

As to the problem of choosing T = f(X) so that its distribution is exactly 
normal, we can observe immediately that a single-valued function /(X) will 
never transform a variate X with a discrete distribution into a variate with a 
continuous one. On the other hand, any variate X with a continuous d.f. 
F(x) can be transformed into a normally distributed variate T by the transforma¬ 
tion T = /(X) defined by the equation 

However, aside from the practical difficulty of solving this equation for T, the 
resulting function T = /(X) will not generally be functionally independent of 
the mean of X. 

These considerations lead us to seek asymptotic solutions to the problems of 
normalization and stabilization. Such solutions are considered in the next 
section. 

3. Asymptotic theorems. In the remainder of this paper, we shall suppose 
that the distri bution of X depends on a parameter n which is to tend somehow to 

* Tippett [14] says "This derivation is not mathematically sound, and the result is only 
justified if on application it is found to be satisfactory." 

*i.e , distribution function. For any given one-dimensional variate X we shall denote 
the probability or relative frequency assigned to a set R by P(R). The d.f of the variate 
then is the point function F (x) = P(X ^ t). This function is sometimes called the cumula¬ 
tive frequency function of X, 
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infinity. The mean y — y n of X, with range S n , will in general depend upon n 
(although by this we do not mean to exclude the case in which y n is constant for 
all values of n), and perhaps will depend also on some further independent 
parameters, which we shall denote collectively by 6, with range 2 We shall 
seek a variate T = f{X ), in which/(X) is functionally independent of y and of 
the parameters 6 for y on S n , 6 on 2, and such that the distribution of f(X) — 
f(y „) tends as n —> °° to a normal distiibution, while lim„_ w crr = c 2 , where c" is 
an absolute constant. It is implied here that in case the additional parameters 
8 are present, the function f(X) may depend non-trivially on n\ but if n is the 
only parameter on which the distribution of X depends, then f(X) must be 
functionally independent of n. 

A solution to the problem just proposed is given in certain cases by Theorems 
3.1 and 3.2 below, which aie suggested by the heuristic reasoning of the second 
paragraph of section 2. 

Theorem 3 1. Let f/ n (x) be a non-negative function of x and n, defined almost 
everywhere and mlegrable 4 with respect to x over any finite interval of the x-axis for 
each n > 0. Let 

T = f(X) = f *„(*) dx, 

J a 

where a is an arbitrary constant. Let F„(y ) be the df. of the variate Y = 
(X — y n )4> n (y n ). Suppose further that a continuous df. F(y) exists such that 
lim„_ 00 F„(?/) = F{y) for all values of y. Then either one of the following two con¬ 
ditions is a sufficient condition for the d.f H n {w) of the variate W = f(X) — /( y n ) 
to lend uniformly to F(w), — oo < w < : 

(a) To each w for which 0 < F(w) < 1, there corresponds for all n sufficiently 
large at least one root x = x„ to the equation 

(3.1) f \f n (u) du = w, 

J t‘n 

and this root x n has the property that 

(3 2) hmji—foo^,, ^i n )i^' 7t (n7i) w. 

(b) For all n sufficiently large, fi n {y n ) > 0, and hm„_, a g„(t«) = 1 uniformly in 
any closed finite subinterval of the open interval defined by 0 < F{w) < 1, where 


(3.3) 


q n (w) = 


'p n (w[\// n (y n )] 1 + Mn) 
'i'niUn) 


To prove this theorem we shall first suppose that condition (a) is satisfied. 
Let m>i and w 2 be the end points of the open interval (possibly infinite) defined by 
0 < F(w) < 1 If w lies in this interval, and if n is large enough for the root 

X 

f/ n (x) dx we can 


x n in (3.1) to exist, then from the monotonic character of 


f 


4 “Integral)]e” here means absolutely mtegrable in the sense of Lebesgue. 


l 
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infer that 

Hn(w) = P[f(X ) - fin#) = P [ f* w,x) dx s wj 

(3.4) = P(X ^ x„) = P[F ^ (x n - n»)+M] 

- F n [(Xn ~ fl»)^n(M»)l- 

Since F(w) is continuous, lim n _ M F n (u>) = F(w) uniformly on any finite or in¬ 
finite interval of values of w, as is well known . 6 Therefore lim„_„J' , n (iu, 1 ) = 
F(w) if lim„_ M iu,i = w. Thus from (3.2) and (3.4), we find that hm„_„//„(ra) = 
F(w) for Wi < w < vh . 

If w' g Wi, and wi < w" < wz , then 0 g H n {w') g H n (w") = F(w") -fi 
[H n (w") — F(w")]. We can make the right hand member of this relation less 
than any given positive number t by first choosing w" so that F{w") < \t (it 
will be remembered that F{w) is a continuous d.f., and F{wf) = 0) and then 
choosing n so large that the quantity in square brackets is also less than in 
absolute value, Thus lim n _ M i?«(«)') = 0 Similarly if w' wz , we can show 
that lim„_*!/„(u/) = 1 Hence lim„_ w ff„(u>) = F(w) for all w, and it follows 
that the limit is umTorm on any finite or infinite inteival of values of w. 

We shall now show that condition (a) in the theorem is a consequence of con- 
. dition (h). The result follows at once from the following simple lemma: 

Lemma. If y n (w) is a non-negative function inlegrable over any finite interval 
of values of w; and if lim n _„ 7 „(iy) = 1 uniformly m any finite closed subinlerval of 
an interval i«i < w < w 2 , then for every value of w in this interval there exists for all 

n sufficiently large a solution y = y„ of the equation f y„ (z) dz = w, and the solu- 

Jo 

tion y n has the properly that lim,,-..#.. = iv. 

For it is clear that if w satisfies the inequality W\ < w < Wi, and if tj > 0 
b,e chosen so that u>i<ui — tj < w p < wz, then for all n sufficiently large, 

! 7 n(z) dz S W i I 7 n(z) dz. 

0 Jo 

Thus for each n sufficiently large, there exists a root y n of the equation 

jf 7 n(z) dz = w, and furthermore, this root satisfies the inequality w — y ^ 

Vn w + i). Since 17 is arbitrarily small, the proof of the lemma is complete. 

To apply the lemma, we make the change of variables 2 = (u — y„)f„(/j n ) 
in the integral in (3.1), which reduces it to the form 

( 3 - 5 ) l 2 «( z ) dz, y = (x - Mn )^ n ( Mn ), 

4 

and the conclusion that (a) is implied by (b) now follows at once 


* See 17], Theorem 11, pp. 29-30, also [8]. 
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We add the remark that the uniformity of the limit of q n (z) in condition (b) 
may be replaced by the condition that for each closed finite sub-interval there 
exists a function q(w ) which dominates q n (w) foi all n sufficiently large. 

Our second theorem, which is stated in the terminology and notation of 
Theorem 3.1, is concerned with the limit of the variance of T = f(X). From 
the meie fact that the distribution of W tends to a limiting form, it by no means 
follows that the mean and variance of the distribution of W approach those of 
the limiting form, as may be shown by trivial examples Thus additional 
hypotheses on f'nix) and on the behavior of the distribution of Y become nec¬ 
essary. 

^Theorem 3 2. Let T (or f(X)), Y, F n (y ) and F(y) be defined as in Theorem 
3 1 Let the mean and variance of the distribution defined by F(y) exist and have 
respective values 0 and c Then the following three conditions, taken together, are 
sufficient that 

(3.6) lim„_«[2?(T) _ = 0, 

(3 7) lim„_„<7r = c 2 : 

(i) E(Y 2 ) exists for n > 0, and lim n _, M S(F 2 ) = c\ 

(ii) Condition ( b) of Theorem 3 1 holds 

(iii) fiYbFniun)]- 1 + Mn) - /W - 0\Y\ uniformly m n as | Y | -> ® 

As a preliminary step in the proof, we observe that (i) and the relations 

]im n _„F n (i/) = F(y), c 2 = / if dF(y), imply that the improper integral 

J—00 

if dF„(y ) converges uniformly in n for n > 0 As the integiand is positive, 

the following result is equivalent to the unifoim convergence of the integral: 
For every e > 0, there exist numbers A\ and A 2 , A\ < A 2 , such that for all n suffi¬ 
ciently large, 

(C+0 y ' dF - (y)<t - ■ 

To prove this, we write 

CC+ 0 y * dFn{y) = [£(y2) ~ ° 2] 

+ if dF(y) - J*' f dFfyij + [ c 3 - £ f dF(y)^ . 

We first choose’ Ai and A 2 so that the last bracket here is less than in absolute 
value By condition ( 1 ), the first bracket appioaches zero as n tends to infinity, 
and the Helly-Bray theorem [10, p 15] states that the second bracket also ap¬ 
proaches zero as n tends to infinity, so for all n sufficiently large, the sum of the 
first two brackets is in absolute value less than |e. 

It is important to notice that we can always choose Ai and At in the above 
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demonstration so that Aj > iv, , A? < u> 5 , where ioi and w 2 are as usual the 
endpoints of the interval defined by 0 < F(w) < 1. 

To continue with the proof of the theorem, we remark that by a change of 
variables similar to the one used to derive (3 5), the function W = f{X) — f(i u „) 
may be expressed as a function of Y in the following manner: 

W = [ tn(x) dx = f g n (w) dw = Q„(F), 

where q n (w) is given by (3.3). In terms of IF, (3.0) and (3.7) become, respec¬ 
tively, 


(3.8) 

(3.9) 


lim E(W) = 0, 

7»-*» 

lim {#(IF 2 ) - [£(IF)] 2 ) = c, 


and these are the equations which we now establish. 

Conditions (ii) and (iii) obviously imply that lim„-* s0 Q ri (i/) = y uniformly in 
any finite closed subinterval of the interval Wi <y < wi, and that a constant M 
exists such that | Q„(y)| g M \ y | for all n. If E(Y 2 ) exists, so will E[Y) 
Now 

E(W) = f + “ QM dF n (y) 

J-oo 

= f QM dF n (y) - f y dF n {y) 

j—00 J—00 

= + £ ) [QM - y 1 dFM + £ [QM - v\ dFM), 


where Wi < A x < A 2 < w 2 . Therefore 


I m r ) I ^ (£‘ + £j (M + 1) I y I dFM + ££• I QM -y\dF r 


iv)- 


/•+» \ 

From the uniform convergence of / y* dFM), proved above, we can conclude 

- J—00 

that the pair of improper integrals in this inequality can be made less than an 
arbitrary > 0 by proper choice of Ai and A 2 . The third integral approaches 
zero, by the general Helly-Bray Theorem [10, p. 16], and so becomes less than 
for all n sufficiently large. Thus we have established (3.8) To show that 
(3 9) is true, we have merely to prove that lim„_ M ,Z?(F s ) = c. Since E{Y”) = 

p +« w 

/ ' y 2 dF n (y), we may write 

J-oo 
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The integral may be shown to approach zero by the argument used in the case of 
E(W), and the required result then follows from condition ( 1 ) of the theorem. 
The proof is now complete. 

The sufficient conditions in Theorem 3.2 can be modified in various more or 
less obvious ways The existence of the limiting d.f, F(y ) was essentially used 


in the proof only to secure the uniform convergence of 


[ +K y'dF n (y). 

oo 


Condition 


(li) can again be modified along the lines suggested at the end of the proof of 
Theorem 3.1 Condition (iii) was used only to secure the uniform convergence 


of the integral 


C [QMfdFM. 

J— 00 


For later reference, we shall supplement Theorems 3 1 and 3.2 with the follow¬ 
ing simple result, which is practically self-evident. 

Theorem 3 3. Let the distribution of a variate Y depend upon a parameter n, 
let F n (y ) be the d.f. of Y, and let F{y ) be a continuous d.f with the property that 
lim n -JFn{y) = F(y). Let a n be a function of n such that lim»_ aJ a„ = a ^ 0. 
Then the d.f. of the variate Z = a n Y tends as n —> oo to the d.f. F(z/a ) if a > 0, 
and to the df. 1 — F(z/a) if a < 0. If the variance of Y exists and tends to c 1 
as n —> oo, then the variance of a n Y tends to a 2 c 2 as ft —> <x >, 

If F(y) is the d.f. of a reduced normal distribution, i e., 


F(y) = 


_i_ r 

V*r *-« 


dt, 


then F(z/a) is also the d.f. of a normal distribution with mean zero and variance 
a 2 . More generally, any affine transformation of a noimal variate yields 
another normal variate. 


4. Applications. The theorems of the preceding section have the effect of 
referring the properties of the distribution of the transformation T = f(X) of 
Theorem 3 1 back to those of the distribution of a related variate Y In the 
applications given in the present section, we shall let f/ n (p n ) be proportional to 
the reciprocal of the standard deviation of X. The theorems of section 3 state 
in this case that if the reduced, or standardized, distribution of X approaches a 
limiting form, then under certain circumstances, the distribution of f(X) — 
f(p n ) will approach a similar limiting form, and a\ will approach a quantity 
independent at least of n. In the applications considered here, the reduced dis¬ 
tribution of X will always approach the reduced normal distribution. 


(I) The square root transformation for a variate with a Poisson exponential 
distribution. Let X have a Poisson exponential distribution with parameter n. 
If a is an arbitrary constant, and if 


X § -a 
X < -a 


(4.1) 


T = f(X) = ( VX Q + a - 
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then the distribution of T — \/n + a tends as ft -* °° to a normal distribution 
which has mean zero and variance y, and lim„^„<r r = y. For n n ~ n, a x - Vn, 
and it is well known 6 that the distribution of the reduced variate {X - n)/Vn 
tends to the reduced normal distribution as n —> °°. By Theorem 3.3, the dis¬ 
tribution of the variate 

V - — ~~ n = * k! n — ~ n 
~ 2 y n + a s/n 

will tend to normality as n -» «, and the variance of Y will tend to the value 
which is also the variance Of the limiting distribution. Setting 


ir, (X) 


| 2V x + a ’ 
0 , 


x > —a 

X g —a, 


we obtain from T = f(X) = f i n (x)dx the formula given in (4.1). To prove 

J-a 


the statement in italics, we must show that conditions (ii) and (iii) of Theorem 
3.2 are satisfied. We have, assuming n > — a, 


q»(w) = 


(‘ + v=h) H ' 


w > —%s/n + a 

ru §5 —iVn + a, 


so clearly (ii) is satisfied. Also, 

w = mu^r 1 + a-) - /U) . 

(V , 27Vm d - ot d - n -f- a ~ "s/n ri* a, 
I - Vn + of, 


Y > —§\/n + a 

Y f s/n + a, 


from which it follows at once that | W j < 2 | Y | for all Y, and so (iii) is satisfied. 

The degree of approximation involved in the equation lim n ^«<rr = J has been 
investigated numerically by Bartlett [1] for values of n from .5 to 15.0 in the 
cases a = 0 and a = £. He found that the variance of s/X ($j is consider¬ 
ably closer to the limit (J) for 1 g n g 10 than is the variance of VX- At 
n = 15, the variance of VX is .256, and that of \/X + (£) is .248. 

The question of the degree of convergence to normality and of the possibility 
of selecting an optimum value of a remain open. Bv expanding the function 
Vx + a in a' Taylor series about X = n with remainder in the form due to 
Schlomilch, it is possible to derive as accurate an estimate of | ~ (i)| as may 


* See (e.g.) [9]. 
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be desired. A rough result easily obtainable by this method is that \ a\ — (|)| g 
3/(4 n), n > 0. 

(II) The square root transformation for a variate with a T distribution. 

Let X have a distribution whose density function is of the following type; 


(4.2) 


<p{x) = 


Kx in ~ 1 e~ hx , 


If a is an arbitrary constant , and if 
(4.3) T = f(X) = 


WX + a, 

1 o , 


x ^ 0 

x ^ 0, h > 0. 

X ^ -a 
X < -a, 


then the distribution of T — ■sj(n/2h) + a tends as n —> «o to a normal distribu¬ 
tion which has mean zero and variance 1/4 h, and lim rl _ooffr = 1/(4 h) For = 
n/(2h), a x = V n/(h\/2) = vVn/h The distribution of the reduced variate 
tends to normality as n —> oo, 7 so that of the variate 

y ~ 2v / m 7T^ “ 2 y nil + 2fc*a ' VlWA 


tends to normality also with limiting variance l/(4/i). Setting 


iM*). 


1 

2\/x -(- a ’ 


I > —a 
£ ^ —a, 


we obtain T in (4.3) from the relation T = I \p n (x)dx. The work of verifying 

that the conditions of Theorem 3.2 are satisfied is the same as in the case of the 
Poisson exponential distribution treated above, and will not be repeated. 

For example, if s 2 denotes the variance of a random sample of n + 1 observa¬ 
tions drawn from a normal parent distribution with variance a 2 , then it is well 
known that (n + l)s 2 is distributed according to (4.2) with h = l/(2<r 2 ). We 
th us ca n deduce the further facts, also well known, that the distribution of 
Vn + 1 « - ay/n tends to normality, and that the variance of sy/n f- l 
approaches the limiting value s<r 2 . If n is an integer and h = f, the distribution 
defined by (4.2) is called a x 2 distribution with n degrees of freedom, and the 
variate is often denoted by x 2 . Our conclusion in this case is that the distribu¬ 
tion of V2x 2 — y/2n tends to a normal one with zero mean and unit variance. 
From this result and_the fact that y / 2 n— 1 - y/2n = 0(n -i ), it follows im¬ 
mediately that y/2% 2 — y/2n — 1 has the same limiting distribution as 
V2x 2 — y/ 2 n. This result, 8 due to Fisher, is familiar to all users of his table of 
the probability levels of x 2 - 


7 See (e g.) [9]. 

1 For a discussion of the degree of convergence involved here, see [9]. 
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(III) The inverse sine transformation for a binomial variate. Let X have 
a binomial relative frequency distribution with ’parameter p and the n values 0, 1/n, 
2/n, • • • , nfn. If a is an arbitrary constant, and if 


(4.4) T = f(X ) = 


Vnsin 1 \/X " + - , —- 5 XII - 

y n n 


0, X < X > 1 

n n 


where T is measured in radians, then the distribution of T — y/n sin -1 yj p -f- ( a /n) 
tends as n —» oo to a normal distribution which has mean zero and variance and 
lim„_„o-r = J. For here, u„ = p, and = pq/n, where q = 1 — p; and the 
familiar DeMoivre-Laplace theorem states that the distribution of the reduced 
variate \/ n(X — p)/\/pq will tend to normality as n —* «. Hence by 
Theorem 3,8 the distribution of 


(4.5) 


Y = 


Vn(X - p) 


V(» + iX* - ij 


will tend to normality with a limiting variance of J, which is also the variance of 
the limiting distribution. Sotting 


M x ) H 


\/ r< 


VHX 1 —;)'■ 


o 


we obtain (4.4) from the integral 


T = f if/ n (x) dx, 

*—a/n 


OC . . - OL 

— < x < 1- 

n n 


^ <x . , a 

X g — - , X Si 1-, 

n n 


In proving the conditions (ii) and (iii) of Theorem 3.2 are satisfied, we shall 
assume for simplicity that a = 0. We find that 



so obviously (ii) is satisfied. From the Law of the Mean in the form due to 
Schlomilcb, we have 


w = Vn sin 1 j/ p + 2|/ll Y ~ Vn sin -1 y/p 


(4.6) 


= 27 


i - e 




u 1 Vi F ). 


- l i/vi. 

W « 




0 < 6 < 1 , 
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The denominator of the coefficient of 2 F here is a quadratic function of F with a 
negative coefficient of F 2 , and so must assume its least value in the Y range 
indicated in (4.6) at one end or the other of the range From this it is readily 
seen that the coefficient of 2F is actually always less than unity For values of 
Y outside the range, the second member of (4.6) indicates that W = 0(\/ n) = 
0(F). Hence (iii) is satisfied, and the proof of the statement in italics is com¬ 
plete for the case a = 0. The more general case presents no important new 
difficulties 

In practice, it is often convenient to express X as a percentage. This merely 
has the effect of multiplying F in (4 5) by 100 We find in this case that s/n 
sin -1 ’\/X + 100 a/n — \/'n sin -1 V lOOp + 100a/n has a distribution ap, 
proaching normality, and <tt —> 50 instead of 

Bartlett [1] gives numerical results m the cases n = 10, a = 0 and ft = 10, 
a = which indicate that perhaps the choice a = | is more suitable if the 
estimated p is near 0 or 1, but the choice a = 0 is preferable if the estimated p 
lies between 3 and .7 However, theie seems to be no good reason to believe 
that these conclusions should be valid for other values of n. The question of an 
optimum a, and of the degree of convergence to normality remain open. We 
note m passing that the latter problem could doubtless be profitably studied by 
combining the methods of proof of Theorem 3 1 with the results of Uspensky 
[15, pp 129-130] on the degree of approximation of the reduced binomial d.f. 
to the normal d.f. 

IV. Other transformations of a binomial variate. Let X have a binomial 
relative frequency distribution with the parameter p and the n values 0,1/n, 2 /ft, • • • , 

ft/ft. 

(a) If 

| Vn sinh -1 Vx = Vn log (V X + \/l + X),° X ^ 0 
T — /(X) = 1 . 

[ - 0 , X < o, 

then the distribution of T — y/n sinh -1 \/p tends asn —* °° to a normal distribu¬ 
tion which has mean zero and variance q/( 4 -(- 4p), and lim„_ x al = g/(4 -f 4p), 

(b) If 


T = /(X) = 


V n log X, 


0 


J 


X > 0, 
X g 0, 


then the distribution of T — y/n log p tends as n —> °o to a normal distribution 
which has mean zero and variance q/p, and lim„_ :c cry = q/p. 

(c) If 


T = f(X) = 


IVnlog^f—, 


0 < X < 1, 
X ^ 0, X £ 1, 


8 All logarithms in this paper are to the base e. 


0 
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then the distribution of T — \-\/n log — - - tends as n —* *> to a not mal distribu- 

2 1 — p 

lion which has mean zero and variance 1/(4 pq), and lim„_ 10O <7r = l/(4pg) 

Since the limiting vanance of each of these transformations involves the 
parameter p, they are not to be regarded as solutions of the problem of asymp¬ 
totic variance stabilization proposed at the beginning of section 3, although it is 
perhaps of some interest that the ir distr ibutions become asymptotically normal. 
In case (a), /'(re) = \Zn/(% V x? + re), x > 0. Setting f„(.r) = f'(x), x > 0, 
and \pn (re) = 0, x £ 0, we obtain 


(4.7) 


Y = (X - p)^0>) = 


Vn(X - p) Vq 


Vpq 2\/l + 


and this variate obviously has the limiting distiibution aseiibod to T — 
\/n sintT 1 \/p in the statement in italics. The truth of that statement now 
follows by an argument similar to that used in the case of the inverse .sine transfor¬ 
mation. 

If p is allowed to vary with n in such a way that lirn„_ M np = «, it is known 
that the reduced distribution of X will still tend to normality. 10 If we suppose 
that lim n _„p = 0, but hm n -„np = «>, we find from Theorem 3.3 that the 
limiting distribution of Y in (4.7) will be normal with mean zero and variance 
J, and that a\ —> It is easily vended that the conditions (a) and (lii) of 

Theorem 3.2 are still satisfied, so wc find that the limiting distribution of \y/n 
sinh -1 \/X — V n sinh -1 V p\ is normal, with moan zero and variance and 
—> T However, since n is now the only independent parameter, we cannot 
here regard the transformation T = Vn sinh -1 VX as a solution of the problem 
of vanance stabilization, because the variate T depends explicitly upon n 
If in case (b) we proceed as m case (a), we obtain as the analogue of (4 7) 
the formula 

and this variate has the limiting distribution ascribed to T — \/ n log X in the 
statement in italics. It now turns out that although condition (ii) of Theorem 
3.2 is satisfied, condition (iii) is not satisfied. We are then faced with the 
problem of proving directly that the improper integral 

I [V n log (p 4- py/Vn) - Vn log p] 2 dF„(y) 

J -Vn 


converges uniformly. 11 The trouble occurs only at the lower limit of integra¬ 
tion, and may be resolved by first integrating by parts, then dividing the range 


10 See (e.g ) [9] 

11 See the remarks following the proof of Theorem 3.2. 
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(— s/n, Ai) into two ranges (— \/n, — log n) and (—log n, rii), and then 
applying Uspensky’s results [15, pp. 129-130], on the degree of approximation 
involved in the DeMoivre-Laplace theorem. 

Case (c) may be handled in a similar manner. 

5. The logarithmic transformation. We shall suppose throughout this section 
that X is a variate whose mean y „ and standard deviation cr in the relation 
a = k„(y n + a), where a is an arbitrary constant, fc„ > 0, and lim B _Jb„ exists 
and is finite If /c„ is constant for all n, say fc„ = k > 0, and if we use the 
heuristic argument of the second paragraph of section 2 to attempt to find a 
transformation which will stabilize the variance of X at fe 2 , we arrive at the 
function T = log (X + a), X > — a It is the purpose of this section to study 
the asymptotic properties of this transformation. 

The theory of such a transformation differs m certain important respects 
from that of the transformations considered in sections 3 and 4 For one thing, 
our starting point in the study of each transformation considered in section 4 was 
the fact that although P(X < 0) = 0, nevertheless the reduced distribution of 
X tended to normality as n —» oo. But in the present case, if X is a variate such 
that P(X i — a) = 0, then the corresponding reduced variate Y = (X — g„)/ 
[K(y n .+ a)] has a d.f. F n (y) such that F n (~l/k„) = 0. Thus if lim„_ w ft n = 
h > 0, the l im i t ing distribution of Y, if it exists, must have a d.f. F(y) such that 
F( — 1/k — 0) = 0. Therefore the limiting distribution of Y can never be nor¬ 
mal if k > 0. 

Moreover (in contrast to the situation in Theorem 3.1) if the reduced variate 
Y does have a limiting distribution, the variate 

(5.D lo s( ,, + a) ,f’ 1 - ( -L^ d u, X>-« 

may have a limiting distribution which is not the same as that of Y. More 
specifically, we haVe the following result: 

Theorem 5 1 Let P(X g —a) =0, let hm„_ ce fc n = k ^ 0, let F n (y) be the 
df of the reduced variate 


Y = X — 

kn(y n + a) ’ 


and let H n (w) be the d.f. of the variate W given by (5.1). If a continuous d f. F(y) 
exists such that lim n ->xF„(y) = F(y) for all y, then 


lim H n (w) 



k > 0 

k = 0. 
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The proof is simpler than the statement, essentially we have only to notice that 



— oo < w < 


and apply the reasoning used above in connection with (3.4). 

From the study of the distribution of T, we now turn for a moment to the 
question of the limit if o> , Here the situation is more consistent with the 
results of section 3. 

Theorem 5 2. Under the hypotheses of Theorem 5.1 and under the additional con¬ 
ditions that the improper integral / w 2 dH n (w)ior I fcTflog (1 + k n y)f dF„(y) 

•*— oo \ 

/> +80 

converges uniformly inn and that 1 y 2 dF(y) = 1 = E(Y 1 ) ) the following relations 

00 

hold: 


(5.2) 


lim jE(TF) = 


l 


± log (1 + hj) dF(y), 
1A « 


k > 0 , 

k = 0, 


(5.4) 


lim E(W 2 ) 


f l Hog (1 +ky)fdF(y), 
J-\/k k 2 


k > 0 
k = 0 


The variance a\ of the variate T = log (X + a) is related to these mean values 
by the equation a\ = k\{E(W 2 ) — [F(JF)f). Thus if F(y) is independent of 
any unknown parameters 8, and if k is positive and is presumed to have the same 
value for all variates m any given problem, then the transformation T = 
log(X + “) is seen to yield an asymptotic stabilization of the variance under 
the conditions of Theorem 5 2. If k = 0, we find from either Theorem 5.2 or 
the proof of Theorem 5.2 that T — log(X + a) converges stochastically to 
log(M„ + a). 

The proof of Theorem 5.2 is similar to that of Theorem 3 2 and will be omitted 

Theorem 5 1 raises the following question: Just what limiting distribution 
must Y have if k > 0, in ordei that the distribution of 17 tend to normality? 
To answer this, we shall note the following simple non-asymptotic result: 

Theorem 5.3. A necessary and sufficient condition that X have a continuous 
distribution with density function 


1 1 



V 2ir log (k 2 + 1) X -f a 

(5.4) <p(x) = ■ 

l 

1 -floe + M 


X exp 

i \ u + a ) 


L 2 log (*» + 1) J 



,0 
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for which <tx = k(ti + a), is that the variate T = log(X + a) have a normal dis¬ 
tribution with mean log(/i -j- at)— log -f- 1 and variance log (A; 2 +1). 

The proof may be given by a routine change of variables. 15 It is to be noticed 
that the heuristic argument of the second paragraph of section 2 would lead to 
the incorrect result that the variance of T was fc 2 instead of log(fc 2 + 1). In 
case k = 1, the mean and variance of T are respectively log(n + a) — .347 and 
693. If the transformation T = logi 0 (X + a) is used, the new mean is 
logio(g + a) — logic \/k 2 + 1 and the new variance is .189 Iog(/c 2 + 1, which 
for values of fc near zero has the approximate value ,189k 2 . 13 

If X is distributed according to (5.4), the density function F'(y) of the corre¬ 
sponding reduced variate Y = (X — y)/[k(y + a)] is 


(5.5) F’{y) - 


y/ 2t log (fc 2 + 1) 1 4- ky 


X exp — 


{log [(1 + ky)y/k 2 + l]} 
2 log (fc 2 + 1) 


']• 


y > 
y ^ 


l 

fc 

l 

fc 


The d f of the variate W = AT 1 [log (X + a) - log( M + a)] is F[{e kw - l)/fc] , 
and, of course, the distribution of W is normal with mean — fc -1 log\/fc 2 + 1, 
and variance fc -2 log(fc 2 + 1). These are the respective yalues of the integrals 
in (5 2) and (5.3). 

If now the distribution of X depends on a parameter n in such a way that as 
n —» the distribution of the corresponding reduced variate Y = (X — y n )/ 
[fcn(y» + a)] tends to the distribution given by (5.5), it follows from the above 
remarks and from Theorem 5.1 that the variate W given by (5 1) has a normal 
limiting distribution Furthermore, under the uniform convergence condition 
of Theorem 5.2, it follows that tends to the value log(fc 2 -f- 1), where T = 
log(X + a). 

These facts provide a sound mathematical basis for the use of the logarithmic 
transformation, which has had a long history of empirical success in problems of 
normalization [12, chapter XVI] and stabilization ([6], [16]). When it appears 
from a reasonably large number of observations on a variate (which is essentially 
bounded from below) that the standard deviation of the variate is proportional 
to the mean, then a possible specification for the variate is a distribution of the 
form (5.4); or, at least for large values of g, it may be assumed that the distribu¬ 
tion of the reduced variate is given by (5.5). Then the variate T = log(X + a), 
where —a is any number less than the lower bound of X, will be exactly or ap¬ 
proximately normally distributed with a variance independent of the value of y.. 

Since (5.4) is only one of an infinity of various different types of distribution 


1! Finney [11] has considered the problem of efficiently estimating the variance of the X 
of Theorem 6.3 in the case a = 0. (The actual density function (6.4) appears nowhere in 
his paper.) 

11 Given (without explanation) by Cochran [6, p. 166]. 
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in which the mean and standard deviation are proportional, the user of a loga¬ 
rithmic transformation in the analysis of variance should always apply tests for 
departure from normality to the observed distribution of T values. From the 
point of view of specification, the situation here would seem to be less reassuring 
than in the cases considered in section 4 While it is true that the Poisson 
exponential distribution is only one of many types of distribution in which the 
variance and mean are equal, nevertheless the specification of a Poisson distribu¬ 
tion can generally be preceded by a fairly strong chain of a priori inductive 
reasoning This would not seem to be the case in the specification of (5.4), 
Theorems 5.1 and 5.2 furnish some grounds for a suspicion that the logarithmic 
transformation may possibly be more successful in stabilizing the variance than 
in normalizing the data. The burden of proof, however, lies with the experi¬ 
menter. 14 
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ON FUNDAMENTAL SYSTEMS OF PROBABILITIES OF A FINITE 

NUMBER OF EVENTS 


By Kai Lai Chung 
Tsing Hua University, Kunming, China 


We consider a probability function P(E) defined over the Borel set of events 
generated by the n arbitrary events Ei, • • * , E n , which will be denoted by 

£(i, 

We use the same notations as in the author’s former paper 1 , with the following 
abbreviations. We denote a combination (m • • ■ a a ) simply by (a), and use 
the corresponding Latin letter a for its number of members. Similarly we write 
(0) for (di ■ ■ ■ A>), but ( v ) for (1, • • • , n) We say that ($) belongs to (a) and 
write (0 ) e (a) when and only when the set (/Si - * * 0b) is a subset of (ai • ■ ■ a„). 
Then and then only we write (a) — (0) for the subset of elements of (a) that do 
not belong to ( 0 ); thus we may write it as (y) with c = a — b. When and only 
when (a) and ( 0 ) have no common elements, we write (a) + (0) for the set of 
elements that belong either to (a) or to (/3); thus we may write it as (y), with 
c = a + b £ n. We note the case for empty sets: (0) + (0) = (0). Now we 
can write p l{a)] for , p«.» for p„ r .. na , p b ((a)) for p b (cti ■ • - a a ), etc. 

Further we denote by pm ((a)) (1 ^ | a i n) the probability of the occurrence 

of exactly b events out of E ai , • • • , E« a , and write 

P. w (W)- £ P~(M), Pl m] ((*)Y =• £ Pw((«))i 

(o) « (») (a) < (r) 


since a is fixed by the left-hand sides, the summations on the right-hand sides 
are to be extended to all the (^Vcombinations of (v). 


A sum written 52 i a to be extended to all combinations (0), b = 0, 1, • • • , a 

W«(a) 

belonging to (a), when b is not previously fixed; it is to be extended to all the 
^^-combinations belonging to (a), when b is previously fixed. 

Definition 1. A system of quantities is said to form a fundamental system of 
probabilities for a set of events if and only if the probability of every event in the 
set can be expressed in terms of these quantities. 

Definition 2. An event in £(1, ■■■ ,n) is said to be symmetrical if and only 
if it is identical with every event obtained by interchanging any pair of suffixes 
d, j) (.1,3 — !,•••, n ) in the definition of it. The subset of symmetrical events 
in £(1, ■ ■ ■ , n) will be denoted by S(l, • • ■ , n). 

From the normal form 2 of every event in £(1, ■ • • , n) and the principle of 


1 "On the probability of the occurrence of at least m events among n arbitrary events," 
Annals of Math. Stat., Vol, 12, 1941. 

* See Hilbert-Ackermann, Grundiuge der theoretischen Logik, Chap. 1. 
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total probabilities, we can easily see the truth of the following theorems, which 
may of course be made more precise. 

Theorem, The system of p ( („)i , {a) t (v), 2 n in number , forms a fundamental 
system for St( 1, • • ■ , n), 

Theorem. The system of p [a ]((v)), 0 g a n, n+lm number, forms a 
fundamental system for §(1, • ■ * , n). 

Next, a theorem of Broderick 8 , in a less precise form, may be stated: 

The system of p ((a)) (p< ( o» = 1), (a) e(v), 2” in number, forms a fundamental 
system for St. 

We may add in an easy way the following 

Theorem. The system of S a ((v)) S 0 ((v)) = 1, 0 ^ a sj n, n + 1 m number, 
forms a fundamental system for S. 

In the present paper we shall prove, inter alia, the following four theorems 
of the above type, stated in more precise forms 
Theorem 1. For any E m St, we have 

■ P(E) = Co + E C«fl,((«)), 

(a ) 

ayiQ 

where Co = 0 or 1 and the c a ’s are integers; and they are unique 4 . 

Theorem 2. For any E in S, we have 

P(E ) = co + E c a Pi'\ 

a—1 

where Co = 0 or I and the c a ’s are integers ; and they are unique. 

Theorem 3. For any E in St, we have 

P(E ) = do + E d.Pw((«)), 

(<■)«(»> 

where do = 0 or 1 and the d a ’s are rational numbers and they are unique. 

Theorem 4. For any E m S, we have 

P(E ) = do + E d a pl n , 

0-1 

where do = 0 or 1 and the d a 's are rational numbers; and they are unique. 

Less precisely, we may say that the system of pi((a)) or P[ii((a)) forms a 
fundamental system for the system of P“ ! ((v)) or Pi 11 ((a)) forms a funda¬ 
mental system for S. 

In fact however, we shall give much more than the mere proofs of 

1 Frfichet, "Complements it un th6or&me de T. S. Broderick concernant lea 6v6nements 
dependants,” Pros. Edinburgh Math. Soc., Ser. 2, Vol. 6 (1939). 

4 “Unique” in the sense that it is impossible to replace therein the coefficients e by other 
numbers which are independent of the Borel set of events and the probability funotion. 
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these theorems. We shall establish the following explicit formulas for the 
general parameter to. 

(i) Pirn = 1 - Pi((»0)i 

(l* 1 ) (ii)' P[(a)i = E (~ l) 6 ’ 1 Pi(( 1 ') — (“) + 0))>‘ 1 <a<n. 


n— 


?[(«>] = ( — 1)’ 


n minCtf.n— n) 

E E (-D 


(1) 

( 2 . 1 ) 

( 2 ) 

where 


,m — 1 


Tl 1 c—ni .1—max(0. c—a) 


c—d 


n - 2 \ _1 

a + d — to/ 


E Pm((y) - (a) + (a)). n> a>m> 2. 6 
(«)<(■>>-(<<) 


pw(W)- E (-D 1 

b— 7»—a 

MO 


b— n+a 


6 > 

n — a/ 


P* n, (M), 


1 < a < n. 


p W («) = E (-ir m Z(«,a, 6, m)P 6 tm) ((r)), n>a>m>2, 

b“m, 


L(n, a, b, m) = -I 




, b < n — a-fm — 1, 
b = n — a + m — 1, 


(3) (i) 


(— 1)" - °(to — 1) I (6 — to) ! 

_ ■ (<z — to) I {qfr — n(m — 1} b > n — a m — 1. 

i. a! (n — a) I (a + 5 — n — to + 1)!’ 

-i-j §(::;)>'• 


« raln(c,n—o) / „ i \— 1 

71 a— m d—max(0,c—a) i (t Wl J 

w x-, .... 


E Pw((t) - (a) + (a)), n > a > TO > 1 . 

(i)« (?) — (a) 

<7)-(S)«(a) 


(4) p M (W)= E (-i)' l - s+6 - m 

b—m+n—a 


(b - mV a\- 
\rc — a/\m/ 


Pl ml ((»)), n> a> to > 1. 


A simpler derivation of (1) than that’ given in an earlier paper 1 follows. Let 
us write Poincarfi’s formula as follows: 


p m (m = e (-ir 


t=i) 


5.(0)). 


6 Obviously we mean ((v) — (a)) + (P) and (( 7 ) — ({)) -f (4) respectively; similarly in the 
sequel. 
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Then for a fixed b i m, summing over all (/3) « (v), we get 

„£„*«»> - £ ( -‘ r ' («: D(": J)« W) - 

Hence 

£ „£, *«■» ■ £ (::!)« w >£ <ft : 0 

m -£(:: !)«»)■ 


1 if c — n 
0 if e < n 


A change of notation gives, for a 5 g m, 

( a l h ~A l ) ?<«+<«> " 2 2 P m ((T)). 

\ m — 1 ) c - m Cy)iCo)+C/J) 


Hence 


«)+w) 


\ m - 1 / (p).(,)-<«) 

a+J m.n0,»-<0 / _ _ j\ 

- 2 (-1)” 2 ( b l A ) 2 p m ((T) - (5) + (5)). 

c-m </«max fp.fi—a) \ D ® / (3)«(*) — (a) 

(Y)-{J).( U ) 

Substituting in the well-known formula, for a S 1 

n—a 

P[(o)l = 2 (“1)‘ 2 Z>CCa) + (/J)) , 

6-0 <0)lCr)-(a) 

we get lor » i a i m 


( 1 ) 


n min (c,n—a) 

pt(«)i = 2 (—i) 0_m 2 

c™m d—mix (0,o—«) 

2 P«(M - (5) + («)) 

(*)«(■■)-<«) 

(»-»)«(«) 




Thus the problem reduces to the summation of the following series: 

Case 1: m = !.• In this case the series reduces to 




d = n — a, 
d < n — a. 
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Hence for a ^ 1 , 

piwi= £ (-1 r 1 £ jh(w-(«)+(7)-(w-(«)))(-ir a 

c“mai(l,n—a) (7) —((v)—(a)) « (a) 

Writing (7) — ((v) — (a)) = (/S), we obtain 

Pro)] = £ (-l) 1 "" 1 2 Pi (60 - (a) + (£)). 

t—max(l—n+a,0) (0) « (tf) 


This is equivalent to ( 1 . 1 ), (ii), while (i) is trivial. 
Case 2: m ^ 2. We have, fore ^ 1, 


S (-•)' (T) c 1m C t - 7 T ’ 

which is easily proved by induction on a. 

Hence for m ^ 2, 

g<-»*(* 7-70 Ci-T 1 )" 

, *g (_„«■(» + a-d)(„ + ^'i- 1 )-* 

- /« + i -1 + i-r 

k'-o \ v / \ m — 1 ) 

_ / m - 1 / n + 2 \ -1 
k ' n - 1 \a + d - m/ 

Substituting in (1) we get formula (1). 

To derive formula (2.1) for a fixed a, 1 ^ a g n, we sum (1.1, ii), which gives 

Pm (60) = £ Pr<«>] = £ (—l) 6 ' 1 £ £ pi(60 - (a) + (jS)). 

<“) « M 1—0 (a) « (*) (0) < (a) 

n—p+ftpiO 


Letting (v) - (a) + (J 3 ) = (7), we get 

pw(00) = £ (-ir o+e ' 

c»mAi(l,fl—o) 


c ^ £ 
\n - a) 


Pi(W), 


which is formula (2.1). 

The following form of Poincare’s formula is of assistance in deriving (2): 


pm(oo) = E(-ir(^ a ((o). 


\ 
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Substituting from (1), we get 


(00) 


»<•.«-» -1 <-(:) (:: i)" t : 0 Pi-’ 

- s c->r-rt*(w) (-«-• (:) (:: *) (: 0"}. 

Thus the problem reduces to the summation of the following senes: 

Un, a, b, m) = (- D- (0 (“ I 0 ' \) 

First, wo have, for z £ 0, y S w, 


E (-l) x ( )(a: + 1/) ■■ (z + w) 

««»mtut(0,l—Ul) \*/ 

0 

(—!)*?/! (2/ + 1 - w) 1 


(z + w — 1 ) ! (y + 1 — 10 — s) ! 


if 2 / — w + 1 < z, 

if 2 / — w + 1 ^ 2, 


which may be easily proved by induction on z 
Next, we have 


L(n, a,b, m) = ——E ( — l) ca ( n h l ~7 -rr 

’ ’ a! c-mtT(a,i) \c - 0 / (c - a)! 


b\ c(c — m)! 


__ (m — 1)1 


a! c 


V / iw'+t-=A* “ b\ (c' + b)(c' 4- b - to)! 
'-miTo.a-b) ^ ; \ c' / (c' + b - a)! 


= (-l) 6 - fo - r 1)! E (-1)' 


= (-!> 


a! c '—maxCO.a— 1>) 

+ b — m + 1)! + (m — l)(c' + b — m)! 
(c' b — a)! 

[!F(?i, a, b, to) + (to — l)T(n, a, b, m + 1 )}, 


(V0- 


&-<. (m — 1)1 


where 

T(n, a, b, rn) 




V (_|)c /ti b\ (c -j- b to ~j~ 1) 1 

a( 0 ,o-H \ c / (c + b — a) 1 


if b^n — a + m— 1, 


(c + b — a )! 

0 if Kfl-s + ffl-1, 

(-1 ) n ~ b (a - ?n + 1)1 (b — to + 1) ! 

(n — o) 1 (a + b — n — m + 1)! 

by the preceding formula. Thus we get the explicit expression for Lin, a, b, to) 
given in formula (2), which is thereby proved 

The derivations of formulas 3 and 4 are similar to the above and may be 
omitted. 
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Now we can give the essential argument for Theorems 1-4. It is evident 
that for any E in JP, we have 

P{E) = 2p [(a)1 , 

where the summation extends to certain combinations (a) t (v). Substituting 
from formula (1.1) we get Theorem 1; substituting from formula (3) we get 
Theorem 3. Next, for any E in ©, we have 

P(E) = Zp [o] ((»)), 

where the summation extends to certain values of a. Substituting from formula 
(1 1), (l) and formula (2) we get Theorem 2; substituting from formula (3), (i) 
and formula (4) we get Theorem 4. We may note these proofs are "construc¬ 
tive”. 

It remains to prove the uniqueness of the coefficients in Theorems 1-4. For 
Broderick’s theorem this has been done by Fr6chet 3 , by introducing “inde¬ 
pendent events” Our proof will be based on the conditions of existence, also 
initiated by Frdchet 6 , for the systems pi((a)), pm ((«)), Pa ] ((v)), ((><)). 

The conditions of existence of the system Pi((a)) have been given by the 
author in the paper 1 , though the proof there is not quite complete, 


1 Conditions of existence of the system P« u ((v)) Given n quantities Qi l) , 
1 g a iS n; what are the necessary and sufficient conditions that they may be 
the system of Pi l> ((v))’s, 1 § a g n, of a probability function defined over 
©(1, •••,«)? 

From formula (1.1), (i) and formula (2) it is evident that necessary conditions 
are, for 1 ^ a g n, 

i (-!)*-«-*( 8 W 2 0, 

/q\ 6—»— a \'l U-/ 

(O) . bjiO 

l - Q? S 0, 
and 

(4) £ £ (-lr-W w b _ ) Qt‘ +1 - q<« = i. 

o—l 6— n —a \ 71 (X f 


The last condition can be re-written as 


E i; 

fr—1 a—inax(l, n—b) 


(„-«) + 1 -«“ 


which reduces to the identity 1 = 1, 


1, 


8 “Conditions d’existence de syst&ne d’6v6nements associSs k certaines probability,” 
Jour, de Math., 1940. However, our interpretation of the term would mean instead “con¬ 
ditions of existence of a probability function defined over a Borel set of events, etc.” 
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To show that the conditions (3) are sufficient, put 


Pm = l — Qn } - 

By (3) and (4) we have, for 0 S a g ti, 

n 

Pi *i ^ 0 and 22 Pm = 1. 

a—0 

Hence they are actually the Pm((v))’s of a probability function. We want to 
show that the Pi l) ((y))'s of this probability function coincide with the given 
Qi^’s, so that this is the probability function we seek. We have, 

p^w)- z vm) = ZPM z (;)(;"?} 

(0) « (0 fl —1 A-*max(l l b—n+fl) Vv \0 tl/ 

= V / £ (_!)«-«+»-»/ c \ , /a\/n-aYl (1) 

c -0 \o-rnai(l.B-«l \7l 0/ *-mon( 1 . 6 -«+a) \7l/ \b — h/J C 

Now the series in curl brackets 

(»-«)© 

^rt+a—1 / C \ (n ~~~ g\ 

a—mnx(l,n—c) \fl — 1 CL/ \ 6 J 

If c = n, the last 

(»->)(”: *) 

■(?)-©§<-«"(’5 $r„: 

If c < 7i, we have 

= 0 + (-1)' 2 ( c V 71 7 a ) 

a**n—c \W — dj \ b / 

-{j; 

Therefore 

- er 



SYSTEMS OF PROBABILITIES 


131 


2. Conditions of existence of the system pm((a)). Given 2" - 1 quantities 
5m ((«))> (a) * (>'), a ^ 1, what aic the necessary and sufficient conditions that 
they may be the system of pm((aO)’s, of a probability function defined over 

£(i, ••*,»)? 

From foimula 3 it is evident that necessary conditions are 


i n m»n(c»n—o) / M 1 \ —1 

iz s ( a l~L ,) 


71 d«»inaxfO,c—a) 


(5) 


and 


( 6 ) 


Z 5m ((?) — (4) + (4)) ^ 0, 

(5) t (>>)— (a) 

. (a) 


1 l)" 1 £ i »: 

71 C “1 \C 1/ ( 7 ) < (v) 


-j n inin(e,n—a) 

1 + - Z Z Z (—i) c 

M- (a) f (>») c—1 d—niax(0,c—a) 


_:/ »-l y 

\cl “b d — 1 J 


(*)»(rH(«) ff[i]((7) (5) + (5)) - 1. 

( 7 )-(«)«(a) 


Consider the sum 

min{c,»—a) 


min(c,n—a) / _ 1 \ —I 

Z Z (-1) J LZ J Z 5m((T)-(4) + 

(o).(») <l-ma*<0,e-a) \a (Z — 1/ (S),(,)-(a) 


m 




For a fixed (5), the number of ways oi writing ( 7 ) = ( 7 ) — (6) + (S) is , 

then since ( 7 ) — ( 8 ) e (a) but (a) — (( 7 ) — ( 6 )) t (r) — ( 7 ), the number of 
— 0 \ 

a _ c _). ^ Thus the coefficient of 5 m(( 7 )) in the sum is 

n min (c,n—a) / \ / \ / -i\—1 

z z (-D'r.V n_c V n_1 ^ 

a-0 d~mox(0,c-a) \(i/\a — C d/\0 + rf — 1/ 

-(^iTsZEZ-^OX^-r 1 )- 0 - 

Therefore the condition (6) reduces to the identity 1 = 1. 

To show that conditions ( 6 ) are sufficient, put the left-hand sides of (5) equal 
to P[( a )] and Pko)) respectively. Then 


P<(<0) — Z P[fa)+(fi) i 

(0)e(t») — (<*) 


( 7 ) 


i n n—a mm(c,n- a—b) / ■» \ — 1 

= iZ(-i) c_1 Z Z (-i) d ( a 1 d l A ) Z 

^ c—l b-*0 d— max(G,e— a—b) l ® -1/ (0) *(><) — (a) 


Z 5[u((y) — (4) + (4)). 

<S).(iO-(b)-<« 

(T)-H)«(a)+(« 
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Let (7) = (7) - (<t>) + ( 0 ), where (</>) < (a), (7) - (4>) «(") - (a). Then the 
sum in the curl brackets can be written, by a combinatorial calculation, as 


mln(a»c) (*— a m.in(d— f.n—a -1 


E 

f-0 


E _ 

b-0 d«max(0 


i )d A-/Vn-a-c+/Y 

(fe-/-6, 1 ' V A 6 - c + d + /Ao 


[*) 1 (a) 

(7)-(#)><y)-(a) 


n — 

+ b + d 


1 Y 

d — l) 


?U)((t) ~ (0) + ( 0 )). 


T h T d —■ 
+ 0-/- 


:)■ 


The sum in the last curl brackets is 

/ „ i \-l n-a min<c-/, n-a-bl / f \ / 

C+.=J-i) 5_-0C 

Inverting the order of summations, 

( n- 1 , y/c -A "fT* /o + b + d - l\ 

\a + C — / — 1/ d-mm(0,c-/-n+a) \ d ) £,_ c _y—d \a + C— / — 1/ 

«( »-l V mln< 2'”" a> (_!)<’( C -/V n \ 

Vfl + C - / - l) rf—max(ofc—/—fi-fct) ^ J W A“ + « " // 


if / = c, 


-c.+:_X+*- iES^C -0 - f if 

lo if f c. 


Hence (7) reduces to 


p«.» “^Ec-ir 1 E fl W ((r)). 

a o-l ( 7 )«(<r' 


Then 


&((«)) 


= E 

(0)>«(a) 


Pm) 




?[!]((«)) - E (-^‘"^((u)) 

b-1 


~ dSS. > {s (-1)6 ~ d (b - d)} ?m((5)) = ?tu((a)) ' 

1 d,dO \ f j 

The conditions of existence of the system Pl 1J ((?)), 1 g e g n, are similarly 
deduced from formula (3), (i) and formula (4) with m = 1. 

Now we can prove the uniqueness of the coefficients in Theorems 1-4. Since 
the proofs are all exactly similar, we take Theorem 2. Suppose, if possible, 
there exists another system of coefficients c' a , 0 g a g n so that 
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Taking the difference, we get a linear polynomial in the variables P ( a l) ((i>)), 
1 < a :S n which must vanish: 

(8) (co - c'o) + t (c a - c' a )P?((v)) = 0, 

a—1 

for all “admissible” values of the variables These values, say Qi 11 , are precisely 
those which satisfy the conditions (3). 

It is evidently easy to construct a system of Qi°, 1 g a g n, which satisfy the 
conditions (3) written with the sign of strict inequality Hence in a suf¬ 

ficiently small neighborhood of the point (Q1°, Q 2 ", - - • , Qn") in the n-dimen- 
sional space these strict inequalities still hold. Hence the polynomial vanishes 
in this neighborhood and so must vanish identically; that is, 



ON THE EFFICIENT DESIGN OF STATISTICAL 
INVESTIGATIONS 

By Abraham Wald 

Columbia University 

1. Introduction. A theory of efficient design of statistical investigations has 
been developed by R. A. Fisher 1 and his followers mainly in connection with 
agricultural experimentation. However, the same methods can be applied to 
other fields also. All statistical designs treated in the aforementioned theory 
refer to problems of testing linear hypotheses. By testing a linear hypothesis 
we mean the following problem: Let Vi, • ■ • , Vn be N independently and 
normally distributed variates with a common variance a. It is assumed that 
the expected value of y a is given by 

(1) E{y*) = PlXla + P&la + • • ■ + dp^pa (a - 1, ■ " , N) 

where the quantities = 1, • • • , p; a — 1, • • • , N) are known constants and 
di, • • • , dp are unknown constants. The coefficients (h , • ■ ■ , (3 V are called 
the population regression coefficients of y on xi , Xi, • • • , and x„ , respectively. 
The hypothesis that the unknown regression coefficients • , dp satisfy a 

set of linear equations 

(2) gufii + • • • + g.pPp = 9 . (t = 1, • • ■ , r; r £ p), 

is called a linear hypothesis The problem under consideration is that of testing 
the hypothesis (2) on the basis of the observed values J/i, •«• , yy . 

In many cases the experimenter has a certain amount of freedom jn the choice 
of the values x ia , The efficiency of the test is greatly affected by the values of 
x ta . The statistical investigation is efficiently designed if the values x, a are 
'chosen so that the sensitivity of the test is maximized. Let us illustrate this 
by a simple example. Suppose that x and y have a bivariate normal distribution 
and we want to test the hypothesis that the regression coefficient £ of y on x 
has a particular value do. Suppose, furthermore, that the test has to be carried 
out on the basis of N pairs of observations (x t , yi), ■ , (xjv , y«), where the 
experiments are performed in such a way that Xi, • • • , xy are not random vari¬ 
ables but have predetermined fixed values. It is known that the variance of 

if 

the least square estimate b of d is inversely proportional to 22 (x a — if where 

o-l 

x = (xi + • • ■ + x„)/N. Hence, if we can freely choose the values Xi, • ■ • , Xy 
in a certain domain D, the greatest sensitivity of the test will be achieved by 
choosing Xi, • • ■ , Xy so that S(x« — £) 2 becomes a maximum. 

In the next section we will introduce a measure of the efficiency of the design 

1 ® ee * or instance R. A. Fisheh, The Design of Experiments, Oliver and Boyd, London, 
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of a statistical investigation for testing a linear hypothesis. In sections 3 and 4 
it will be shown that some well known experimental designs, used widely in 
agricultural experimentation, are most efficient in the sense of the definition 
given in section 2. 


2. A measure of the efficiency of the design of a statistical investigation for 
testing a linear hypothesis. The hypothesis (2) can be reduced by a suitable 
linear transformation to the canonical form 


(3) ft « ft =■•■■= ft - 0, (r < p). 

Hence, we can restrict ourselves without loss of generality to the consideration 
of the hypothesis (3). 

it 

Denote x ia x i<* by a„ and let the matrix || c,, [| be the inverse of the matrix 
1 

|| a„ || (i, j = 1, ,p). Denote by b, the least square estimate of 

ft (t = 1 , • ■ ■ , p). It is known that the estimates iq, • ■ ■ , b p have a joint 
normal distribution with mean values ft , • • • , ft , respectively. It is further¬ 
more known that the covariance of 5 , and b, is equal to c„ <r 2 . The statistic used 
for testing the hypothesis ( 3 ) is given by 


(4) 


F = 


N — P iZj m~l _ 

T -> 

^ y ( Va &1 %la ' bpXptx) 


where || o* m || is the inverse of || C(„, || (( l , m = 1, • ■ • , r). The statistic F 
has the F-distribution with r and N — p degrees of freedom. The critical region 
for testing the hypothesis (3) is given by the inequality 


(5) 


F § F 0 , 


where the constant Fo is determined so that the probability that F > Fo (cal¬ 
culated under the assumption that (3) holds) is equal to the level of significance 
we wish to have 

It is known that the powei 1 function 2 of the critical region (5) depends only 
on the single parameter 


( 6 ) 


X = Z) a *n 


O’ 1-1 m— 1 


ft ft 


Furthermore this power function is a monotonically increasing function of X. 
The coefficients a im are functions of the quantities x, a (i = 1, • • • , p; a = 
1, • • ■ , N). The choice of the values x ta (i = 1, - • • , p; a = 1, ■ ■ • , N) is 
the better the greater the corresponding value of X. If r = 1, the expression X 


* See for instance P. C. Tang, "The power function of the analysis of variance tests,” 
Slat. Res Mem., Vol. II, 1938. 
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reduces to aft /3’. 

<r 


Hence, if r = 1, we maximize X by maximizing a* . Since 


a* = l/c u , we maximize X by minimizing fin . Thus, if r = 1, we can say that 
we obtain the most powerful test by minimizing c n , i.e. by minimizing the 
variance of hi If r > 1, the difficulty arises that no set of values 
x.„ (i = 1, ■ ■ ■ ,p;ot — 1, • • • , N) can be found for which X becomes a maximum 
irrespective of the values of the unknown parameters ft , * • • , ft. Hence, if 
r > 1, we have to be satisfied with some compromise solution. For this purpose 
let us consider the unit sphere 


(7) 


+■••+#- 1, 


in the space of the parameters ft , —, ft . 
in p of the determinantal equation 

I * * 

flu — f> Gii 


(8) 


d<l l dll — P 

* * 

a r i a r i 


It is known that the smallest root 


* 

air 

at 


= 0, 


* 

d rr — P 


is equal to the minimum value of <r\ on the unit sphere (7). Similarly the 

greatest root of (8) is equal to the maximum value of <r 2 X on the sphere (7) The 

compromise solution of maximizing the smallest root of (8) seems to be a very 

reasonable one However, for the sake of certain mathematical simplifications, 

we propose to maximize the product of the r roots of (8). Since the product of 

the roots of (8) is equal to the determinant 

* * 

®ll ' ' ' Air 

(9) 

* * 

O’rl ' * ’ flrr 


we have to maximize the determinant (9). The value of the determinant 
| ciw | (l, ifi = 1, ■ • ■ , r) is the reciprocal of that of (9). Hence we maximize 
(9) by minimizing the determinant | c;« | . The generalized variance of the set 
of variates ft , • • • , ft is equal to the product of a lr and the determinant | ci,„ | . 
Thus, our result can be expressed as follows: The optimum choice of the values 
of x,„ is that for which the generalized variance of the variates ft, • • ■ , ft 
becomes a minimum. 

Any set of pN values (t = 1, ■ • ■ , p; a — 1, • • ■ , N) can be represented 
by a point in the pN-dimensional Cartesian space, Denote by D the set of all 
points in the piV-dimensional space which we are free to choose. If N is fixed 
and if any point of D can be equally well chosen, the following two definitions 
seem to be appropriate: 

Definition 1. Denote by c the minimum value of the determinant | Ci m | 
(l, m = 1, • • , r) in the domain D.' Then the ratio c/| c; m | is called the efficiency 
of the design of the statistical investigation for testing the hypothesis (3) 

Definition 2. The design of the statistical investigation for testing the hypothesis 
(3) is said to he most efficient if its efficiency is equal to 1. 
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3. Efficiency of the Latin square design. A widely used and important 
design in agricultural experimentation is the so-called Latin square. Suppose we 
wish to find out by experimentation whether there is any significant difference 
among the yields of m different varieties Vi , ■ • • , v m . For this purpose the 
experimental aiea is subdivided into m~ plots lying in m rows and in columns 
and each plot is assigned to one of the varieties v x , • , v„ . If each variety 

appears exactly once in each row and exactly once in each column, wc have a 
Latin square arrangement. Denote by y ul , the yield of the variety a h on the 
plot which lies in the z-t.h row and j-th. column. The subscript k is, of course, 
a single valued function of the subscripts i and j, since to each plot only one 
variety is assigned. The following assumptions arc made: the variates y t ji. 
are independently and normally distributed with a common variance a 2 and the 
expected value of y xl i is given by 

(10) = M. + + pi- 

The parameters o- 2 , g,, v, and pk aie unknown. The hypothesis to be tested 
is the hypothesis that variety has no effect on yield, i.e. 

(11) Pi = Pa = • • • = pit. 

We associate the positive integer a(i, j) = (i — l)m + j with the plot which 
lies in the i-th low and j-th column (i, j = 1, • • ■ , in'). It is clear that for 
any positive integer a < in’ there exists exactly one plot, i.e exactly one pair 
of values i and j, such that a = a(i, j). In the following discussions the symbol 
y a (a = 1, ■ ■ • , rrc) will denote the yield y, Jk , where the indices % and j are de¬ 
termined so that a(i, j ) = a. The plot in the i-th row and j-th column will be 
called the a-th plot where a = a(i, j). 

We define the symbols t la , u Ja , zj.„ (i, j, k = 1, > • ■ , »r, a = 1, • ■ • , m 2 ), 
as follows: t, a = 1 if the a-th plot lies m the r-th row, and t ia = 0 otherwise. 
Similarly u } „ = 1 if the a-th plot lies in the j-th column, and u ja = 0 otherwise 
Finally z t „ = 1 if the fc-th variety is assigned to the a-th plot, and z* a = 0 
otherwise. Then equation (10) can be written as 

E(Va) = Mlbo + • • ' + Prntina + ViUla + " " * 

( 12 ) 

“1“ Vtnll ma “j" PlZla ”b ' ‘ ' T" Pragma • 

Denote the arithmetic means —s 23 i ia , —; 23 u , a , and — 23 zi a by fi , w, and 

in 2 „-i m 2 a -i m 2 „-i 

z, respectively. Let l ia — fi„ — U , = m,„ — Hi , z,„ = z la — z,, p< = 

Pi — Pm > Vt ~ v x — v m and pi = p, — p m for i = 1, ■ • , m — 1. Let further¬ 
more w„ = 1 for a = 1, • ■ , m 2 . Then we have 

t\a — t|a -{- fiU) a J Mi a = Mj a “I" M, UJ a J %xa = Z ta 2, 10a J 

(*'= 1, 1) 

"I fma “ (1 fi * fin—fla ' ' " fin—I,a ) 


U ma = (1 — Mi — • ■ ■ — Um-^Wa — — M,'„_i, a , 

K Zma ~ (1 &1 * " * 2m—l)ul a Zla — « - . Z TO —l,a ■ 


( 13 ) 
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From (12) and (13) we obtain 

m—1 m—1 wv**! 

(14) E(y„) = £w« + 2 + 2 v'iU{ a + 2 P<' z <« 

i«l *~1 »—1 

where 

m—1 m—1 ^ 

f = £ f*U+ £ vj ik + + + 


The hypothesis (11) can be written as 

(15) pi — pt = • • • = pL-i = 0. 

This is a linear hypothesis in canonical form as given in (3). The values z ta 
(i = 1, ■ • ■ , m — 1; a = 1, ■ • • , m) depend on the way in which the varieties 
«i, • ■ ■ , v m are assigned to the rri plots. We will show that we obtain a most 
efficient design if we distribute the varieties over the m s plots in a Latin square 
arrangement, i e. if each variety appears exactly once in each row and exactly 
once in each column. 

Let Qla ~ Wa , ,a “ iia (r = 1, * ‘ , 7H 1), Qm+i,a = Uja (J “ 

m* 

1 , ■ ■ ■ ,m - 1) and g 2m _i + *,„ = z k * (k = 1, • • • , m — 1). Denote £ ?•«?/« 

o™l 

by a,,- (i, j = 1, 2, • • • , 3m — 2) and let the matrix || c„ || be the inverse of the 
matrix || a„- 1| (i, j = 1, ■ • • , 3m — 2). Let us denote by A the determinant 
I o.i I (b J = 1, ■ ■ • , 3m — 2), by A : the determinant | | (t, j — 
1, • • • , 2m — 1), by A 2 the determinant | a,y j (?', j = 2m, • • • , 3m ~ 2) and 
Ai the determinant | c,, | ( i, j = 2m, ■ • • , 3m — 2). We have to show that for 
the Latm square arrangement A 2 becomes a minimum. From a known theorem 8 
about determinants it follows that 

(16) Ai = Aj/A. 

Hence, we have merely to show that A/ Ai becomes a maximum for the Latin 
square arrangement. Denote by A, Ai and A s the values taken by A, A t and 
A 2 , respectively, in the case of a Latin square arrangement. Since, for the Latin 
square arrangement, as is known, 

m* w< m 1 

= X/ ti a = Tj ZiepiW a ~ 0 (t, j, fc ~ 1, * * * , 71% — 1) 

n*l a—1 

we have 

(17) t = A,. 

Ai 

Since the matrix || a tj j] (i, j = 1, • • • , 3m — 2) is positive definite we have 

(18) A g Al . 

Ai 


• SeeM. B6cheb, Introduction to Higher Algebra, 1931, pp. 31. 
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Because of (17) and (18) the Latin square design is proved to be most efficient 
if we show that A 2 < A 2 . 

Denote by A 2 the m-rowed determinant | a,, | (i, j = 1, 2m, 2m + 1, • ■ • , 
3 m — 2). Since ai; = 0 for j 9^ 1, we have 

(19) A? = a u A 2 = m 2 A 2 . 

Denote J) ZiaZj a by b it (i, j = 1, • • • , m). Then 

a—1 


( 20 ) 


by = 0, for i t* j 
and bu = N,, 


where N, denotes the number of plots to which the variety has been assigned. 
Because of (20) we have 


( 21 ) 


bn 


= NiN, • • • N n 


b m i b mm j 

According to (13) we have 

+ 2fw a = Zia , (i = 1 , • • - , m - 1) 

Zfn—l,a ~}~ U*o(l “ Si ■ * * Z m -i) Ztt 

The determinant of these equations is given by 


( 22 ) 


/ 

a 



1 

0 

0 

... o 

0 

Zl 


0 

1 

0 

... 0 

0 

Zl 


0 

0 

0 

... 0 

1 

—1 


-1 

-1 

-1 

... _l‘ 

-1 

5 

r 11 _ 


(23) 


where 5 = 1— h — z 2 — — z m _ i . It is easy to verify that 


(24) X = 1. 

From (21), (22) and (24) it follows that 

(25) A* = NjN* • • • N« . 

Hence, from (19) we obtain 

(26) A 2 = NiN 2 ■ • ■ NJm 2 

In the case of a Latin square design we have Ni = . Nt = ■ • • = N n = m. Hence 

(27) A 2 = m m ~\ 

Because of the condition Ni + Ni + • • ■ + A r m = m 2 , the right hand side of 
(26) becomes a maximum when Ni = Ni = • • • = N n = m. Thus A 2 < A 2 
and consequently the Latin square design is proved to be most efficient. 
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4. Efficiency of Graeco-Latin and higher squares. Consider m varieties 
and m treatments q \, • • ■ , q m Suppose that we wish to find out 
by experimentation whether the yield is affected by varieties or treatments, 
For this purpose the experimental area, is subdivided into m 2 plots lying in m 
rows and m columns and to each plot one of the varieties and one of the treat¬ 
ments is assigned. We call this arrangement a Graeco-Latin square if the follow¬ 
ing conditions are fulfilled: 1) each variety appears exactly once in each row and 
exactly once in each column; 2) each treatment appears exactly once in each row 
and exactly once in each column; 3) each variety is combined with eachof the 
treatments exactly once. 

The following general abstract scheme includes the Latin square and Graeco- 
Latin square as special cases: Consider an r-way classification with m classes in 
each classification. Denote by y Hai . the value of a certain characteristic of 
an individual who is classified m the ai-claBS of the first classification, in the 03 - 
class of the second classification, ■ • , and in the a r -class of the r-th classifica¬ 
tion. Suppose that m 2 observations are made for the purpose of investigating 
the effect of the classes on the value of the characteristic under consideration. 
We will say that we have a generalized Latin square design if the following con¬ 
dition is fulfilled: Let i, j, m' and m" be an arbitrary set of four positive integers 
for which i / j, i < r, j < r, m' < m and m" < m, Then among the m indi¬ 
viduals observed there exists exactly one individual who belongs to the m'-class of the 
i-th classification and m 11 -class of the j-th classification. 

It is clear that if r = 3 the above scheme is a Latin square. If r = 4 we have 
a Graeco-Latin square. 

Assume that the observations y„, .. ar (ai, a 2 , ■ • • , a r = 1, ■ • - , m) are nor¬ 
mally and independently distributed with a common variance a 2 . Assume 
furthermore that the expected value of y„, , . 0r is given by 

■«,) ~ 7loi "b * ‘ ‘ “1“ 7ra r • 

The parameters a and y ta (i = 1, • • ■ , r; a = 1, • ■ * , m) are unknown con¬ 
stants. Suppose that we wish to test the hypothesis that 

(28) 7a — 7.2 — • • ■ = ytm . 

It can be shown that if the number of observations is limited to m 2 , we obtain a 
most efficient desip by constructing a generalized Latin square. The proof of 
this statement is similar to that of the efficiency of the Latin square and is 
therefore omitted. 



SOME SIGNIFICANCE TESTS FOR NORMAL BIVARIATE 
DISTRIBUTIONS 

By D. S. Villars and T. W. Anderson 

United States Rubber Company, Passaic, New Jersey, and Princeton University 

1. Introduction. In the theory of linear regression of y on x where y is nor¬ 
mally distributed about a linear function of x, say v + 0x, where x is a “fixed” 
variate, the f-test for the hypothesis that 0 is zero (that y is distributed 
about v, independent of x) is well known. In this paper we apply some general 
statistical theory to the similar problem where x and y are jointly normally 
distributed. This case is commonly known as the case of “error in both vari¬ 
ates.” We derive a criterion for testing the hypothesis that the population 
means are the coordinates of a specified point when the ratio of the variances 
and the population correlation coefficient are known When the ratio of vari¬ 
ances is known, a criterion is derived to test whether the correlation coefficient 
is zero. 

2. The means. Let us consider a sample of n pairs of observations ( 2 : 1 , y x ; 

Xi , y% ; • • ■ ; x n , y n ) from a normal bivariate population. Let the variances of 
x and of y be a\ and a\ , lespcctively; and the correlation coefficient, say p, be 
zero. Suppose the ratio of the weight of y to the weight of x, say 7 = w v /w x = 
ol/vl i is known although the variances are not known. It is clear then, that 
Vt y has variance <x \. Since the observations y, {i = 1, 2, ■ ■ •, n) can be trans¬ 
formed into revised observations Vy V< = , we lose no generality by assuming 

that x and y are both distributed with variance a. 

Under the assumption of equality of variances and independence of variates 
we shall derive a criterion for testing the null hypothesis that each observation 
x, is of a variate distributed about the same population mean y and each observa¬ 
tion y, is of a variate distributed about the same population mean v. The 
hypothesis may be stated symbolically as: 

HC-E(x) = y, E(y) = v, 
given u\ = = & 1 and p = 0. We can write 

52 in ~ m ) 2 = »(® - yf + S x , 

»■»! 

52 {yi > 0 2 = n(y ~ v) 2 + , 

i -1 

where 

, 1 v - 1 v 

£= - 1 y = - J-'V' > 

n «—1 n t-i 

<S* = 52 (*. - x) 2 , St = 5) 2 - ■ 

,_i ^1 
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Then n(x — nY/<r 2 and n(y — v) 1 / cr 2 are each distributed independently as x 
with one degree of freedom and each of Sx/cr 2 and S u /c follow the * 2 -law with 
n — 1 degrees of .freedom. If we define 

(1) r = V(x - m) 2 + (S ~ v)\ S r = S x + S v , 

then nr 2 /(r 5 and «S r /<r 2 have Independent ^’-distributions with 2 and 2n — 2 de¬ 
grees of freedom, respectively. 

It follows from this that 


(2) R 


_ nr 2 j Sr 
2aV (2 n - 2)<r : 


= n(n — 1 ) = n(n — 1 ) 

Or 


(x - p ) 8 + (y ~ r ) 2 

S x + S v 


has the F-clistribution with 2 and 2n — 2 degrees of freedom. 
Let us define F a so 


(3) 


f fa , 2 )i -2 (F) dF = a, 

* V _ 


where /te.sn-i (F) is the F-distribution with 2 and 2n — 2 degrees of freedom and 
0 < a < 1. Then the probability is a that the sample statistic R is greater than 
or equal to F„ , i.e., 

(4) P\R > F a ] = a. 


In considering a sample value of R, at significance level a, one rejects the hy¬ 
pothesis of the means being n and v, respectively, if R is larger than F a , i e., 
larger than 1 and larger than the a significance point in Snedecor’s tables [ 1 ]. 

This F-test is a straightforward generalization to the bivariate case of the 
usual /-test as applied to the univariate case. In each case the sum of squares 
of distances of the observations from the population mean is broken up into the 
sum of squares of distances from the sample mean plus n times the square of the 
distance from the sample mean to the population mean. The /-test for the uni¬ 
variate case depends on the ratio of the distance of the sample mean from the 
population mean to the square root of the sum of squares of distances from the 
observations to the sample mean. The proposed F-test depends upon the ratio 
of the square of the distance of the sample mean from the population mean to 
the sum of squares of distances from the observations to the sample mean. 

It can easily be shown that the likelihood ratio criterion for this hypothesis is 


The hypothesis considered here is one of a class of hypotheses treated by Kolod- 

ziqczyk [ 2 ] in a paper in which he considers the likelihood ratio criterion for a 
set oi general linear hypotheses. 

Equation (4) may be written 


Z (*, - *) 2 + Z (//. - yf 

*-i _<-i_ • 

Z (at — /O* 4- Z (Vi — v? 

L *-i i-i 
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(6) P{{x ~ m) 2 + {y — v ) 2 > r 2 } = a, 

where r\ = F„ {S x + S u )/[n{n — 1)]. The probability is a that the distance 
from the sample means x, y to the population means p, v is greater than or equal 
to r a . We may call r a the fiducial radius [3], and the equation (x — a) 2 4- 
(y — v 'f = r l defines the confidence region for the population means. 

Suppose \vc have two samples of m and n 2 pairs of observations, respectively, 
from normal bivariate distributions. If the population mean of each a; variate 
is jx and the population mean of each y variate is v, the population variance of 
each variate is a', and the correlation coefficient is zero, then the sample means 
xi and yi of the first sample and x 2 and j/ 2 of the second sample follow normal 
distributions. Also ah — I 2 and fj\ — y% are normally distributed Then 
r /2 = ni 112 /( 1 x 1 + ix 2 )[(ti — X 2) 2 + (yi — 7 / 2 ) 2 ]/<j 2 has the x l -distribution with 
2 degrees of fieedom. Let 

Zj {Xi, — X2) Z + 

t-1 


Z (f/2. “ Vlf, 




&r> ~ Z/ ( x n ~ Xi) 2 + Z (l/li — Vif 4* 

1-1 i-l 


where x u , Vu (i — 1,2, • • • , n t ) are the pairs of observations in the first sample 
and x»i , 2 / 2 . (x = 1,2, ■ • ■ , 1 x 2 ) are the pains of observations in the second sample. 
Sr'/cr 2 is distributed according to the x 2 'distribution with (2ni -)- 2n 2 — 4) de¬ 
grees of freedom because it is the sum of quantities independently distributed 
as x 2 - Then 

R' = n i n * r>t j __ _ ihntjni ■+ m - 2)r /2 

2(ni 4~ 7 x 2 ) c 2 / (2rxi 2712 — 4)tr 2 (711 -f- 7ij)S(' 


has the F-distribution with 2 and (2ni 4- 2 tx 2 — 4) degrees of freedom, This 
fact yields us a significance test for the hypothesis that both the means of the 
x variates and the means of the y variates foi the two populations are the same. 
We can also set up confidence regions for ni ~ M 2 and r a — r 2 . 

Now let us consider a sample from a normal bivariate population with means 
p and v, variances a 2 and <j\ and correlation coefficient p. Suppose 7 = <rl/al 
and p are known. The transformation 


( 8 ) 


x 


Vi 4- p x' + Vi - py' 
V2 


Vl + P x' - Vl - py' 
V2y 


gives us the variates x' and y' which are distributed independently and with 
variance a \, Applying the results above we see that 


R = n(n — 1 ) 


(£’ - mT 4- ( y' - y') 2 
Z {x'i — x'f + Z (fA ~ 5') 2 

,-1 1-1 


= 71 (xi — 1) 


(X - p.y - 2 pV 7 (x — p){y - y) 4- 7 (y - v) 2 
Z (Xi - x)* - 2 p\/y Z (x. - x)(y, - y) 4- 7 Z (f/> - vf 


( 9 ) 
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has the ^-distribution with 2 and 2n - 2 degrees of freedom. From this we 
derive significance tests, fiducial radii, and confidence regions as before. 

The above distributions, significance tests, and confidence regions are easily 
generalized to multivariate normal distributions. Suppose we have a sample of 
n ^-tuples of observations (x 1<t j (i = 1, 2, • ■ • , k; a = 1, 2 , ■ • , n) from a k- 
variate normal distribution. Let the expected value, of each variate x, be zero 
(j, = 1, 2 , • • • , k), the variance of each variate be a and each correlation co¬ 
efficient be zero. Then 

n(n — 1) 

( 10 ) R" = —- 

S £ (x ia - XiY 
1—1 0—1 

has the ^-distribution with fc and k(n — 1) degrees of freedom. Significance 
tests, confidence regions, and fiducial radii follow from this fact. 

3. Linear Regression. If one has a sample of n pairs of observations (xi, y\ ; 
Xt, y 2 x n , y n ) from a normal bivariate population and wishes to fit a 

straight line to the scatter of sample points, one fits the line in such a way that 
the sum of squares of distances from the sample points to the line is a minimum 
(“error in both variates”). 

It is easily shown that this line goes through the point whose coordinates are 
the sample means (x, y.) If the slope of a line through (£, y) is tan 9, the dis¬ 
tance from a sample point (x,, y.) to the line is (x,- — x) sin 9 — (y, — y) cos 
0. The sum of squares of distances from sample points to the line is 

sin 2 8 S x — 2 sin 6 cos 8 + cos 2 9 S y , 

where 


s xtl = 22 (*. - x)(y> - y)- 

i-l 

If we minimize the above expression with respect to 0 we find 

(11) b = tan 9 = S v - S x ± V (S y - S x y + 4 S xv 

2Sxy 

Using the plus sign gives us S p , the minimum sum of squared distances; using 
the minus sign gives us S a , the maximum sum of squared distances, (The latter 
value of tan 6 is the negative reciprocal of the former.) 

S p is the sum of squared distances perpendicular to the regression line and 
S 0 is the sum of squared distances along the regression line. The sum S v + S a 
is equal to 8 X + 5„ which is the sum of squares of,distances from the sample 
points to the point x, y We have thus decomposed S x + S u into two compon¬ 
ents, one perperidicular to the regression line and the other along the regression 
line. 
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The joint distribution of S„ and S a may be derived from the Wishart distribu¬ 
tion of the sums of squares and cross products, 1 

|K"-4) 


( 12 ) 


(Si Sxy 
Sxy Sy 


p—ilSx+Sy)!** 


4jro' 2n_2 r(n - 2) 

Let us make the transformation 

(Si = cos 2 6 S a + sin 2 6 S p , 

Sy = sin 2 0 S a + cos 2 6 S p , 

(Sxy = sin 0 cos 6 (S„ — S p ). 

The value of 0 corresponds to the plus sign in (11). We find 


(Si + Sy = Sp S a , 

(Si Sxy _ rf n 

Sxy Sy ~ • 


The Jacobian of the transformation is (S a — S p ). Using these relations in (12) 
and integrating out 8 we derive the distribution of S a and S p 


03)- 


1 (SxSpY^ 

4<r*r(n - 2) \ a* ) 


(S a - 


Sp). 


It can be shown that S„ and S p are the characteristic roots of the sample van- 
ance-covariance matrix. The distribution (13) of the characteristic roots of a 
variance-covariance matrix when the population correlation coefficient is zero 
and the variances are equal has been demonstrated by P. L. Hsu [4] 

As a test of correlation (i.e., test of significance of the regression coefficient) 
we propose using the ratio 


F' = SJS P . 


This ratio is the maximum ratio of the sum of squared deviations in one direction 
to the sum of squared deviations in the perpendicular direction. It is intuitively 
evident that this ratio is probably near unity if the null hypothesis is true, that 
is, if the variances are equal and the correlation is zero. If the correlation 
is not zero then the ratio is likely to be large. 

From (13) we can deduce the distribution of F' by transforming variables 
and integrating out the extraneous one. This procedure yields us as the dis¬ 
tribution of F' 

(» - 2)2"-V'* ( "- 4, (F' + ir <n-1) (F - 1). 


If we make the transformation 


F' = e 2 ' 


1 This distribution is equivalent to Fisher’s distribution of the sample variances and 
correlation coefficient when the population correlation coefficient is zero. 
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we find the probability element of z' to be 

(n — 2)(cosh z')~ <n_1> d(cosh z') 

After integrating we see the cumulative distribution of z' is 

1 — (cosh 

Critical values of z' for various levels of significance may be determined from a 
table of hyperbolic cosines. Table I gives some values of z' and the corre¬ 
sponding values of F'. 


TABLE I 


Percentage point* for the z' (or F') distribution 



t' 

F' 


P, JO 

P 10 

i*,. 

P 01 

P.V01 

P,n 

P 10 

P. M 

•P. i 

■P.001 

3 

2 292 

2 993 

3.688 

5.298 

7.601 

08.0 

308 

1600 

40,000 

4,000,000 

4 

1.444 

1.818 

2 178 

2.993 

4,144 

17.9 

38 0 

78.0 

398 

4,000 

5 



1.056 

2.216 

2 993 

9 59 

16.5 

27 4 

84.2 

398 

6 

.958 

1.178 

1 381 

1.818 

2.412 

6.79 

10.6 

15.8 

38.0 

124 

7 

846 


1.207 

1.572 

2.059 

5 43 

7 92 

11.2 

23 2 

61.4 

8 

766 

933 

1 084 

1 402 

1.818 

4.63 

6.47 

8 74 

16.6 

38.0 

9 


866 

992 

1,276 

1 643 

4.09 

5 55 

7.28 

12.7 

26.8 

10 

656 

.796 

.920 

1 178 

1.509 

3.71 

'4,91 

6 30 

10.6 

20 6 

11 

.616 

746 

.862 

1.100 

1 402 

3 43 

4.45 

6 61 

9.02 

10.6 

12 

.683 

.705 

.813 

1.035 

1.314 

3.21 

4.10 

5.09 


13.9 

13 

554 

.670 

.772 

980 

1.241 

3.03 

3 82 

4.68 


12.0 

14 

.530 


736 

.933 

1 178 

2.89 

3.50 

4.36 

6.47 

10.6 

15 

.508 

.613 

.705 

.892 

1 124 

2.76 

3 41 

4.10 


9.47 

20 

.429 

517 

.593 

.746 

993 

2.36 

2,81 

3.27 

4.46 

6,47 

25 

378 


.522 

.654 

814 

2.13 

2.48 

2 84 

3.70 

5.10 

30 

.342 

411 

471 

.589 

732 

1,98 

2 28 

2.57 

3,25 

4 32 

40 

293 

.362 



621 

1.80 

2 02 

2.23 

2,73 

3 47 

60 

.237 

284 

324 


.498 

1.61 

1.76 

1 91 

2.24 

2.71 

120 

.165 

.198 

226 

281 

.345 

1 39 

1.49 

1.57 

1,76 

2.00 


The use of F' has been suggested Here to test the hypothesis that the popula¬ 
tion correlation coefficient is zero when it is known that the variances of the two 
variates are the same, or, more generally, when the ratio of the two variances is 
known. This gives a test of significance of the regression coefficient when there 
is error in both variates if the ratio of the variances is known. The test arises 
from intuitive considerations. F' can also be used to test the hypothesis that 
P = 0 and 4 = <4 {Hi in Hsu’s paper). C. T. Hsu [5] and J. W. Mauchly [6] 
have shown that the likelihood ratio criterion for this hypothesis is 


"2 (8,3 ¥ - 

IF' 

L os. + s v y J 

L(F' + l)aj 
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If we set the normal distribution function equal to a constant, we determine 
a contour ellipse in the x,y~ plane. Since these ellipses of constant probability 
density are circles when p = 0 and <r* = 4 > Mauchly calls the test a test of circu¬ 
larity. The same procedure as used to test whether these ellipses are circles can 
be used to test whether the ellipses have major axes in a certain direction and 
with a specified ratio of lengths of axes. Suppose we wish to test the hypothesis 
that the. major axis is inclined to the x axis at an angle 6 and that the ratio of 
lengths of the major axis to the minor axis is k. This is equivalent to the hy¬ 
pothesis that p = po and a\ = y 0 ov. To do this we rotate coordinate axes of the 
variables of the distribution (hence changing coordinates of all sample points)' 
through 0 and change the scale of one of the new variables by the factor of fc. 
The transformation is 

x = kx' cos 6 — y‘ sin 0 , 


y — kx 1 sin 9 + y' cos 0. 

In terms of x\ y' the null hypothesis is p' = 0, e\> — <v , and one proceeds as. 
above. Of course, if 70 is known then this method can be used to test the nulL 
hypothesis that p = pg. 

4, Illustrative Example. An application of the formulae given above may be 
illustrated from the data in Table II, which gives two sets of electrical conductiv¬ 
ity measurements at different field strengths. The assumption that the two 
variances are equal is thus reasonable. 

Table of Pairs of Observations of Electrical Conductivity 


*< 

Vi 

Xi 

Vi 

5.0 

5.1 

5.5 

5.1 

7.4 

7.0 

5.3 

5.0 

7.0 

7.7 

4.7 

4.4 

8.8 

7.7 

8.6 

7.1 

7.8 

6.8 

7.5 

7.3 

5.1 

5.5 

5.6 

6.3 

6.6 

7,4 

7.4 

6.5 

8.8 

7.7 




Is it reasonable to regard x and y as being independently distributed in the 

population on the basis of these data? 

The sums of squares and cross products of deviations from the means and the 

calculated slope are: 

& = 29.40, Sn = 19.99, 

S y = 18.04, b = °- 7554 - 



148 


D. s. VILLARS AND T. W. ANDERSON 


The maximized variance ratio is: 

_ S x + 2bSz U + b l S y _ 69.89 _ ., , „ 

~ VS. ~ S v 4.615 ' ' 

?' = |1 tlF' = 1.36. 

Comparing with Table I for n = 15 we find this value of z' very highly sig¬ 
nificant (probability less than 0.001), and at this probability level and on basis 
of our data, x and y cannot be considered to be independent in the population. 

Since the regression is significant, it becomes of interest to compute the calcu¬ 
lated points Xi and F, which fall on the regression line 

F = 1.35 + 0.7554 X, 

corresponding to each observed point x, , y,-. They are obtained from these 
equations 

F, = y + - £) + (2/* - #) 


= .481a:, + .363y, +- .86, 
X, = x + 


1 (x,' - x) + -4-r, (y, - y) 


l + V v ’ ' ' 1 + 6 2 

= .637xj + l481y, - .65. 


The minimized sum of squared deviations from the regression line (i.e., squared 
distances between observed and calculated points) is the denominator of the 
expression for F' divided by the factor (1 4- 5 s ), 

4.615/.5706 = 2.64. 


It should perhaps be pointed out that the tests of the means described in the first 
part of this paper are no longer applicable since we do not know the population 
correlation coefficient. 
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SYMMETRIC TESTS OF THE HYPOTHESIS THAT THE MEAN OF 
ONE NORMAL POPULATION EXCEEDS THAT OF ANOTHER 

Br Herbert A. Simon 
' Illinois Institute of Technology 

1. Introduction. One of the most commonly recurring statistical problems is 
to determine, on the basis of statistical evidence, which of two samples, drawn 
from different universes, came from the universe with the larger mean value of a 
particular variate. Let M u be the mean value which would be obtained with 
universe (F) and M x be the mean value which would be obtained with universe 
(A). Then a test may be constructed for the hypothesis 1 M v > M z . 

If ®i, • • • , are the observed values of the variate obtained from universe 
(A), and y-i, • • - , y n are the observed values ohtained from universe (F), then 
the sample space of the points E\(x i, • • • , x„ ; y x , ■ ■ • , y n ) may be divided into 
three regions , and w 2 If the sample point falls in the region w 0 , the 

hypothesis M y > M t is accepted; if the sample point falls in the region on, the 
hypothesis M y > M x is rejected; if the sample point falls in the region w 2 , 
judgment is withheld on the hypothesis, Regions w 0 , wi, and u 2 are mutually 
exclusive and, together, fill the entire sample space. Any such set of regions 
wo, an , and w 2 defines a test for the hypothesis M v > M x . 

In those cases, then, where the experimental results fall in the region w 2 , the 
test leads to the conclusion that there is need for additional data to establish a 
result beyond reasonable doubt. Under these conditions, the test does not 
afford any guide to an unavoidable or non-postponable choice. In the applica¬ 
tion of statistical findings to practical problems it often happens, however, that 
judgment can not be held in abeyance—that some choice must be made, even at 
a risk of error. For example, when planting time comes, a choice must be made 
between varieties (A") and (F) of grain even if neither has been conclusively 
demonstrated, up to that time, to yield a larger crop than the other. It is the 
purpose of this paper to propose a criterion which will always permit a choice 
between two experimental results, that is, a test in which the regions wj and wi 
fill the entire sample space. In the absence of a region w 2 , any observed result 
is interpreted as a definite acceptance or rejection of the hypothesis tested. 

2. General characteristics of the criterion. Let us designate the hypothesis 
M v > M x as Ho and the hypothesis M x > M v as Hi . Then a pair of tests, T 0 
and Ti , for Ho and Hi respectively must, to suit our needs, have the following 
properties; 

(1) The regions woo (woo is the region of acceptance for Ho , wio the region of 
rejection for Ho ; woi and wn the corresponding regions for Hi) and «u must 

1 This paper presupposes a familiarity with the theory of testing statistical hypotheses as 
set forth by J. Neyman and E. S. Fearson [1). 
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coincide, as must the regions w l0 and w 0 i ■ This correspondence means that when 
H a is accepted, Hi is rejected, and vice versa. Hence, the tests T 0 and 7\ are 
identical, and we shall hereafter refer only to the foimer. 

(2) There must be no legions oi m and . This means that judgment is never 
held in abeyance, no matter what sample is observed. 

(3) The regions woo and w M must be so bounded that, the probability of accept¬ 
ing Hi when H 0 is true (erior of the first kind for To) and the probability of 
accepting H 0 when Hi is true (error of the second kind for T 0 ) are, in a certain 
sense, minimized. Since H n and Hi are composite hypotheses, the probability 
that a test will accept Hi when Ho is true depends upon which of the simple 
hypotheses that make up Ho is true. 

Neyman and Pearson [2] have pioposed that a test, T a for a hypothesis be 
termed uniformly more powerful than another test, Tp , if the probability for T a 
of accepting the hypothesis if it is false, or the probability of rejecting it if it is 
true, does not exceed the corresponding piobability for Tp no matter which of 
the simple hypotheses is actually true. Since there is no test which is uniformly 
moie powerful than all other possible tests, it is usually required that a test be 
uniformly most powerful (UMP) among the members of some specified class 
of tests. * 


3. A symmetric test when the two universes have equal standard devia¬ 
tions. Let us consider, first, the hypothesis M y > M x where the universes from 
which observations of varieties (X) and (7), respectively, are drawn are nor¬ 
mally distributed universes with equal standard deviations, a, and means M x and 
M v respectively. Let us suppose a sample drawn of n random observations from 
the universe of variety (X) and a sample of n independent and random observa¬ 
tions from the universe of (7). The probability distribution of points in the 
sample space is given by 


(1) p(x i, ■ • ■ , x n ; yi, ■ ■ ■ , y n ) = (2t*T 


-At s (>,-».)>+ 2 («,-*„)»] 

iC* t I 


In testing the hypothesis M v > M x , there is a certain symmetry between the 
alternatives (X) and (7). If there is no a priori reason for choosing (X) rather 
than (7), and if the sample point £V. (a a , • • • , a„; bj, • ■ , b„) falls in the region 
of acceptance of then the point E 2 : (in , ■ • • , f>„ ; a t , • • • , a„) should fall in 
the region of acceptance of Hi. That is, if E\ is taken as evidence that M„ > M x ) 
then Ei can with equal plausibility be taken as evidence that M x > M v . 

Any test such that Ei. (ai, ■ • • , a„ ; h \, • • • , b n ) lies in wo whenever E 2 : 
0>i i ■ ■ , ; ai > • • ■ , a*) lies in wi and vice versa, will be designated a symmetric 

test of the hypothesis M„ > M,. Let 0 be the class of symmetric tests of Ho ■ 
If T a is a member of fl, and is uniformly more powerful than every other Tp 
which is a member of fl, then T a is the uniformly most powerful symmetric test of Ho. 

The hypothesis M v > M x possesses a UMP symmetric test. This may be 
shown as follows. Prom (1), the ratio can be calculated between the proba- 
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bility densities at the sample points E: (xi , • •. , x„ ; yi, • • • , y n ) and E': 
(y 1 ) ' * * ) Vn ; %1 j j X n ). We get 

(2) ^ = exp l^(x - y)(M x - M„)j, 

where 

x = i > 2/ = - Y' y,-. 

n V n i a 

Now the condition p(E) > p(E') is equivalent to - 2 (x - y)(M x — Mf) > O' 

Hence p(E) > p(E') whenever (x - y) has the same sign as (M x — M v ). 

Now for any symmetric test, if E lies in u 0 , E' lies in toi, and vice versa. 
Suppose that, in fact, M v > M x . Consider a symmetric test, T a whose region 
uo contains a sub-region oi ou (of measure greater than zero) such that y < x 
for every point in that sub-region. Then for every point E' in u au , p(E') < 
p(E). Hence, a more powerful test, T$ could be constructed which would be 
identical with T a , except that um , the sub-region symmetric to w au , would be 
interchanged with u ou as a portion of the region of acceptance for Ho. There¬ 
fore, a test such that u 0 contained all points for which y > x, and no others, 
would be a UMP symmetric test. This result is independent of the magnitude 
of (M x — M v ) provided only M v > M x . We conclude that y > xis a uniformly 
most powerful symmetric test for the hypothesis M v > M x . 

The probability of committing an error with the UMP symmetric test is a 
simple function of the difference | M v — M x \. The exact value can be found 
by integrating (1) over the whole region of the sample space for which y < x. 
There is no need to distinguish errors of the first and second kind, since an error of 
the first kind with To is an error of the second kind with Ti , and vice versa. 
The probability of an error is one half when M z = M v , and in all other cases is 
less than one half. 

4. Relation of UMP symmetric test and test which is UMP of tests abso¬ 
lutely equivalent to it. Neyman and Pearson [2] have shown the test y — x> k 
to be UMP among the tests absolutely equivalent to it, for the hypothesis 
My > M x , They have defined a class of tests as absolutely equivalent if, for 
each simple hypothesis in Ho , the probability of an error of the first kind is 
exactly the same for all the tests which are members of the class. If k be set 
equal to zero, y > 2, and their test reduces to the UMP symmetric test. What is 
the relation between these two classes of tests? 

If T a be the UMP symmetric test, then it is clear from Section 2 that there is 
no other symmetric test, Tp , which is absolutely equivalent to T a , Hence 0, 
the class of symmetric tests, and A, the class of tests aboslutely equivalent to 
T a , have only one member in common—the test T a itself. Neyman and 
Pearson have shown T a to be the UMP test of A, while the results of Section 4 
show T a to be the UMP test of fl. 
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5. Justification for employing a symmetric test. In introducing Section 3, a 
heuristic argument was advanced for the use of a symmetric, rather than an 
asymmetric test for the hypothesis M v > This argument will now be given 
a precise interpretation in terms of probabilities. 

Assume, not a single experiment for testing the hypothesis M v > M x , but a 
series of similar experiments. Suppose a judgment to be formed independently 
on the basis of each experiment as to the correctness of the hypothesis. Is 
there any test which, if applied to the evidence in each case, will maximize the 
probability of a correct judgment in that experiment? Such a test can be shown 
to exist, providing one further assumption's made: that if any criterion be applied 
prior to the experiment to test the hypothesis M v > M x , the probability of a 
correct decision will be one half. That is, it must be assumed that there is no 
evidence which, prior to the experiment, will permit the variety with the greater 
yield to be selected with greater-than-chance frequency. 

Consider now any asymmetric test for the hypothesis Ho—-that is, any test 
which is not symmetric The criterion y — x > k, where k > 0, is an example 
of such a test. Unlike a symmetric test, an asymmetric test may give a different 
result if applied as a test of the hypothesis H 0 than if applied as a test of the 
hypothesis 11 \. For instance, a sample point such that y — x = e, where k > 
t > 0, would be considered a rejection of Ho and acceptance of Hi if the above 
test were applied to H 0 ; but would be considered a rejection of Hi and an ac¬ 
ceptance of Ho if the test were applied to Hi Hence, before an asymmetric 
test can be applied to a problem of dichotomous choice—a problem where Ho or 
Hi must be deteripinately selected—a decision must be reached as to whether the 
test is to be applied to H 0 or to Hi. This decision cannot be based upon the 
evidence of the sample to be tested—for in this case, the complete test, which 
would of course include this preliminary decision, would be symmetric by def¬ 
inition. 

Let H„ be the correct hypothesis {11 0 or Hi , as the case may be) and let H* 
be the hypothesis to which the asymmetric test is applied. Since by assumption 
there is no prior evidence for deciding whether H„ is Ho or Hi, we may employ 
any random process for deciding whether H* is to be identified with H 0 or Hi. 
If such a random selection is made, it follows that the probability that H„ and 
H* are identical is one half. 

We designate as the region of asymmetry of a test the region of points Hi' 
(«i i • ■ ■ ,; bi , ■ ■ ■ , b„) and IS*: (bi, ■ • • , b* ; ffli, - • , a„) of aggregate measure 
greater than zero such that and Ei both fall in « 0 or both fall in on . Suppose 
wo„ and mob are a particular symmetrically disposed pair of subregions of the 
region of asymmetry, which fall in w 0 of a test To . Suppose that, for every 
, point, H , in u 0 a, b > and that uo a and toot are of measure greater than zero. 
The sum of the probabilities that the sample point will fall in « 0 a or wob is exactly 
the same whether H e and H* are the same hypothesis or are contradictory 
hypotheses. In the first case H c will be accepted, in the second case H c will be 
rejected. These two cases are of equal probability, hence there is a probability 
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of one half of accepting or rejecting H c if the sample point falls in the region of 
asymmetry of To. But from equation (2) of Section 2 above, we see that if the 
subregions no tia and com, had been m a region of symmetry, and if w 0 a had been in 
uo, the probability of accepting H c would have been greater than the probability 
of rejecting H c . 

Hence, if it is determined by random selection to which of a pair of hypotheses 
an asymmetric test is going to be applied, the probability of a correct judgment 
with the asymmetric test will bo less than if there were substituted for it the 
UMP symmetric test. It may be concluded that the UMP symmetric test is to 
be preferred unless there is prior evidence which permits a tentative selection of 
the correct hypothesis with greater-than-chance frequency. 

6. Symmetric test when standard deviations of universes are unequal. 

Thus far, we have restricted ourselves to the case where a x = <r„, Let us now 
relax this condition and see whether a UMP symmetric test for M v > M x exists 
in this more general case. 

We now have for the ratio of p(E) to p(E '): 

(3) = ^p ^ _ ~ f 1 *) — 2(<rlM z - alM v )(x — #)]j, 

where 

Mx = £ x V n > m v - 2D yt/n- 

X * 

Even if <r y and <r* arc known, which is not usually the case, there is no UMP 
symmetric test for the hypothesis M v > M x . From (3), the symmetric critical 
region which has the lowest probability of errors of the first kind for the hy¬ 
pothesis (M v = fa ; M x = fa ; fa > fa) is the set of points E such that: 

(4) (o-J — (rl)(n x — ny) — 2(Vyfa — <rlki)(x — y) > 0. 

Since this region is not the same foi' all values of fa and fa such that fa > fa , 
there is no UMP symmetric region for the composite hypothesis M v > M x . 
This result holds, a fortiori when <r tf and <r x are not known. 

If there is no UMP symmetric test for M y > M x when <r„ ^ <r x , we must be 
satisfied with a test which is UMP among some class of tests more restricted than 
the class of symmetric tests. Let us continue to restrict outselves to the case 
where there arc an equal number of observations, in our sample, of ( X ) and of 
(7). Let us pair the observations x t , y t , and consider the differences u, = 
Xi — y x . Is there a UMP test among the tests which are symmetric with 
respect to the w,’s for the hypothesis that M v — M x = — U > 0? By a sym¬ 
metric test in this case we mean a test such that whenever the point (ui , • • • , u n ) 
falls into region , the point ( —, • • • , — w„) falls into region . 

If x t and y, are distributed normally about M x and M v with standard devia¬ 
tions <t x and cr y respectively, then Ui will be normally distributed about U = 
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M m - M v with standard deviation <r u = V <rl + - 

for the sample points E v \ (ui, s - • , u«) and E t : (—Ui , 


( 5 ) 


m 

pis 


f—2n 1 

= exp < —r , 


The ratio of probabilities 
■ * •, -'tin) is given by: 


where 


n i 

Hence, p(E v ) > p(E' v ) whenever u has the same sign as U. Therefore, by the 
same process of reasoning as in Section 2, above, we may show that u < 0 is a 
UMP test among tests symmetric in the sample space of the u's for the hypothe¬ 
sis U < 0. 

It should be emphasized that ft, u , the class of symmetric regions in the space 
of E„:(ui ■ • ■ tt n ), is far more restricted than 0,, the class of symmetric regions 
m the sample space of E:(xi • ■ ■ x n ;y x • • > y n ). In the latter class are included 
all regions such that; 

(A) E‘,(a h '“,a„;bi, •••, b n ) fallsmwowhenever E :(6 l( •• ,6„;ai, ,a„) 

falls in ui, Members of class 0 JU satisfy this condition together with the 
further condition: 

(B) For all possible sets of n constants k , • ■ •, ft*, E: (ii -f h ( • ■ •, x n + k n ; 
Vi + k, • • •, tf» + K) falls in w 0 whenever E\ (x t , ■ • • , x„; y x , ■ - ■ , y„) falls 
in wo, When <r„ ^ <r x , a UMP test for M y > M x with respect to the symmetric 
class ft, u exists, but a UMP test with respect to the symmetric class ft, does not 
exist. 
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ON INDICES OF DISPERSION 

By Paul G. Hoel 
University of California, Los Angeles 

1, Introduction. In biological sciences the index of dispersion for the binomial 
and Poisson distributions is very useful for testing homogeneity of certain types 
of data. For example, the dilution technique m making blood counts finds it 
useful. Recently there have been attempts to use it to determine allergies by 
observing the change in the blood count after allergic foods have been taken. 
Here the sample may consist of only a few readings, consequently it is important 
to know how accurate this index is when applied to small samples. After in¬ 
specting the application of the Poisson index to such counts, I was surprised to 
see the lack of agreement with theory, At first it appeared that the fault lay 
with the chi-square approximation which is used on this index, but later it was 
clear that the assumption of a basic Poisson distribution was at fault It now 
appears that statisticians will need to be careful about citing blood counts as 
examples of data following a Poisson distribution. 

This paper is the result of investigating the accuracy of the chi-square ap¬ 
proximation for the distribution of these indices. Previous work on this problem 
seems to have consisted in some sampling experiments [1] for small values of 
the parameters involved, and in some theoretical work [2] in which the sampling 
distribution is considered only for a fixed sample mean. Although sampling 
distributions ordinarily differ very little from the distributions obtained by 
assuming the mean of the sample fixed, for small degrees of freedom the dif¬ 
ference may be appreciable and therefore requires investigation In this paper 
the accuracy of the chi-square approximation is investigated by finding expres¬ 
sions for the descriptive moments of the distribution which are correct to terms 
of order N~ 3 . These expressions are obtained by means of Fisher’s semi-in¬ 
variant technique. 

2. Moments of the distribution. Employing Fisher’s notation [3], let the 
binomial index of dispersion be denoted by s, then 2 may be written as: 

_ 2(x - xY _ (N - 1 )k t _ N - 1 __ ki _ 

N - 1 

Letting w = ki — m, y — ki, a = n — m, b = , \, z may be ex- 
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panded as follows: 



= h{y + ciwy -r CiW 2 y + c s w*y + • • •}, 


where the definition of c, is obvious. As will be seen later, these expansions are 
valid for obtaining the expected values of powers of z; hence 

E(z) = b (moi + CiMu + C 2 M 21 + • • •} 

E(z 1 ) = 6 2 {fioi + 2 cimu + (2ci + c?)/U 22 + (2cj + 2ciCi)nn + • ■ ■ 1 

(i) 

Eii) — b 3 {fio3 -f 3cif/ l3 + (3 c 2 + 3ci)mi3 + (3c3 + fic 2 ci + c?)m 33 + • • •} 
E(z 4 ) = b* jpo4 + 4ci/u< + (4 c2 6ci)m 24 + (4c 3 + 12 c 2 Ci + 4.c\)mu + • ■ •). 


Since only the first four moments of z are to be found, it will be necessary to 
evaluate the p„ for j = 1, 2, 3, 4 and for i = 0, 1, 2, • ■ ■ as far as necessary to 
give the desired degree of accuracy. 

First consider the relation between the moments p,/ and the semi-invariants 
k,, which are defined in terms of the p,, by the following formal identity in t and r. 


giot+gpiT , «2o**+g«ntf+gQar* , 

e 11 21 


, , Piof + PoiT , Mior + 2 pu tr + P02 t 1 
1 H-ri-r-- Ki -T 


1! 


2! 


Differentiating both sides with respect to t and replacing the exponential factor 
by the right member gives an identity which is convenient for evaluating the 
mm ■ Differentiating both sides with respect to t and making the same replace¬ 
ment gives an identity which is convenient for evaluating the p,; for j > 0. 

These identities express p,, as a sum of products of k's and p’s, each such 
product being of total degree i and j in its subscripts. By repeated substitution, 
Mu' can be expressed as a sum of products of k’s only. From Fisher’s formulas 
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each such semi-invariant, , can be expressed as a sum of products of semi¬ 
invariants of the basic distribution, each term of which sum is of order 7\r (r+ ' _1) 
in N. Hence it follows that the lowest order term, or at least one of the lowest 
order terms, m A r in the expression for m.j will be a term with the maximum 
number of k factors. Since the k„ of lowest degree in subscripts are k 10 and koi 
the term with the maximum number of k factors will be the term in kIokoi 
H owever, since w = ki - «i has a zero mean value, /i I0 = k 10 = 0, consequently 
the, lowest degree term involving the subscript i > 0 is k 20 or k u . As a lesult, 
the maximum number of k factors will be found in the term containing k\' 0 k h 
fbr i even and K*o , ~ 1> *or 1 Ku for i odd. These terms are of order N~ il and A’ i(,+1) 
respectively. Since it is desired to obtain accuracy, of order A -3 , it therefore 
will suffice to evaluate for i < 6, 

The validity of the expansions used m arriving at (1) could now be shown by 
writing them as partial sums with remainder terms and then showing that the 
remainder terms are of higher ordei than N ~ 3 

Neglecting terms.of higher order than A -3 , the above identities give the follow¬ 
ing expressions'for ya for j — 0, 1, 2 and i = 0, 1, • ■ • , 6, with slightly longer 
expressions for j = 3 and 4 


1 C 

5 

l! 

o 

Moi = KOI 

M 20 — *20 

Mil = *n 

M30 = *30 

M 21 — *21 4" K 0 lM 20 

MlO = *40 4“ 3*2OM20 

M31 = *11 4" 3 *n/i 20 4- *oiM30 

/ 1 50 — OwaO /220 4“ 4*20/130 

Mil = 6 * 21/120 4” 4/CiiMao 4" *oiMio 

Moo = 5*20M40 

Msl = 5 khM40 4- K 01 M 6 O 


Msi = * 0 lMM 


M02 — *02 + *0lM01 

Ml2 = *12 + *llM01 + NOlMll 

M22 = N22 4" *2lM01 4“ *02^20 4" 2*UMll 4“ *0lM21 

/Ij 2 = K 31 MOI 4" 3*12/120 4“ 3*2lMll 4" *02M30 4” 3*llM21 4- XOlMjl 

M42 = 6*2lM21 4- *02/H0 4" 4*ilM31 4~ *0lM« 

M62 = 5*u/Ul 4" *0lM51 

M«2 = *0lM«l ■ 


The next step is to apply Fisher’s formulas expressing the k„ in terms of tho 
semi-invariants of the basic variable distribution, which in this case is the bi¬ 
nomial distribution. In Fisher’s notation *« would be written as k( 1 2 ), since 
the variables w and y are respectively fci, measured from its expected value, and 
h . Applying such formulas, the following expressions for the y tl and /i l2 are 
obtained, with somewhat longer expressions for the y,3 and yn . 
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Mol = *2 j 


„ 2 

K 3 *4 . * 2 


MSI 


_*S_ , 4*3*2 

: iv 5 Iv 5 " 1 


25*3 *2 

•*- ~w* 


Mn = 


p 41 = + 4*5 , 3*5 

N 3 N 3 . A'' 2 ’ 

15*5 


iV a 


*4 , 2 

«S = ^ + *2 


( 2 ) 


_ Kb , 2* 3 * 2 

Ml2 “ F 2 + ^r 


+ *] 

[r^i +i ] 


[rh + 1 ] 

+ w [r^i + 0 


_ 5*6 *2 . 7*4 * 3 7*3 *5 

m N 3 " t ‘ W + ~W 


,, _ I®*! K l i 20*5 *2 

**“-~ +-T7T- 


M62 = 


Mt 2 = 


N 3 
_ 40*3 *2 

= ~P~ 

15*2 
iV 3 • 


diirihi? 0688 *? t0 TT th63e *’ S in terms of the P arame tera of the binomial 
distribution. Here the * s are defined by the following formal identity in 8, 

e s 3 =(q + pe) n . 

Jf^thJfoUow^ GXpanding in P ° wers of °> and equating coefficients of powers 
oi u, tne following expressions are obtained ■ 


*i = m 


*2 — mq 

*a = mq(q - p) 

*< = mg(l — 6pg) 

*6 = mq(q- p)( 1 - I2p g ) 

*« = mg(l - 30p<? + 120pV) 

*7 = mq(q - p)( 1 - 60pg + 360pV) 

* 8 = m 3 (l - I26pg + 1680p 2 g 2 - 5040 p*q 3 ). 
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These values of the k s are inserted in (2) to give the following expressions for 
the mi and M 2 , with considerably longer expressions for the ms and mu * 

/ioi = mq 


m = mq(q- p)~ 


/i21 


= mq ( l .-An + m 

1 \ N 2 


) 


- ~ 58 Pg 


Mi = rn q 


N 1 


+ 


3 8/ s 25 
Mi = m q (q — p) —■ 

* 4 15 
Hn = m q — 


/102 — Wl<7 
Mis = Wtffa 


/l - 6pg , 2mg , \ 

V“r- + ir=i + mq ) 

- p)(- 


- 12pg 


+ 


2mq 


MS2 


W 2 ' fV(Af - 1) 1 IV 

_A - 3 0 p 5 + 120pV , 8wi<z(l - bpq) 

— I | 


) 


N z 


2 2/ x/l2 — 102pg 

Mss = m q (5 - p)(-jyj—^ + 


IV 2 (IV - 1) 

, mq( 5 — 26pg) 2vi q~ m a q 2 \ 

N* T N{N - 1) + N ) 


14 mq 


+ 


7 mq 


m = mV 


^36 - 176pg 


N 3 


+ 


tf 2 (IV - I) 1 iV 2 
6mg , 3 mq\ 


) 


+ 


N^N - 1) ' TV 2 / 


■ 4i/ v 40 
M6s = m q (q - p) — 

S C 15 

Mes = m g . 


It remains to express the coefficients of (1) in terms of these same parameters. 
From the definition of c,, a, and ki , it follows that 


'-'i — “ 7~ — 


- 1 +- 
a ki 


v_ + (-DV 

m i q i 
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If now the above values of the ju,> and c, are inserted in the expressions (1), the 
following final formulas are obtained. 

«,) - W-1){! + i+(£)' +(i)V--} 


Etf) - or - i)‘|i + ~i 


2(1 - 6pg) , pqi 2 ~ Upg) 
(iV - l)IVwig ' ( Nmq ) 2 


_ 2(1 + 2pg - 25p a g 2 ) 

(IV - \.)(Nmq) 2 r 

S( 2 3 ) = (IV- + ' 


2pg(l + 3pg — 3Qp 2 g") 
(Nmq) 3 

3£5 , ■ 8 _ _ 

Nmq ' (N — l) 2 



6(1 - 3pg) 
(AT — \)Nmq 


2pg(l — 5pg) 4(1 - ipq)(N - 2) _ 24(1 — 5pg) 
+ (Nmq ) 2 (N~ l) 2 Nmq (N - 1 ) 2 Nmq 

6(1 — llpg + 40pV 6pg(l - 16pg + 55p 2 g a ) 

(N - l)(Nmq ) 2 + (Nmqy 

60pg(l - 4 pq)(N - 2) 1 

(IV — l) 2 (IVwg ) 2 / 


E(* 4 ) = (IV- 1)" |l + 


8pg , 44 

Nmq + (N - l) 2 


12( 1 + 2pg) 

(N — 1 )Nmq 


2pg(2 — 21pg) 16(1 — ipq)(N — 2) 48 8(15 - 46pg) 

(Nmq) 2 + (N - l) 2 Nmq ' (N ~ l) 3 (IV - 1 fNmq 


__ 12(3 — 44pg + 138p 2 g 2 ) . 64pg(l — 4pg)(A r — 2) 

(N - l)(Nmq) 2 h (N^ r rj 2 (Nmqy~ 

96(1 - 4pg)(IV - 2) 8(1 ~ 12pg + 36p 2 g 2 ) (41V 2 - 9iV + 6) 

(N — l) 3 Nmq (N — l) 3 (Nmq) 2 


4pg(l - 43pg + 168p 2 q) 
(Nmq) 3 



By considering the formation of terms, it can also 1 be shown that the above 
expressions are correct to terms of order m , m 2 , m\ and wi l) , respectively, in the 
parameter m. If m is large these expressions are considerably more accurate 
than the order N~ 3 would indicate since the lowest order terms neglected in these 
expressions are respectively N*m\ NW, NW, and N*m. 


3. Applications. To compare these moments with those of the chi-square 
distribution, consider the ratios of corresponding moments, both for the Poisson 
distribution and for the binomial distribution m the special case of p = $. 
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For the Poisson distribution, these ratios are 
Ri = 1 


R 2 = 1 - - 


Nm (Nm) 2 




R = 1 l f j3 . 1 _ 7 \ 

4 N + 3 3m 2 Nm) ' 

Por the binomial distribution with p = j, these ratios are 

P = 1 I 1 1 4 . _J_ 

1 ^ Nn ^ (Nn) 2 ^ ( Nn ) 3 

Rl = (* “ ri) ( 1 + Wn ~ Whi 2 ) ~ 4(A^)" 3 

Ri = 0 “ 0( x “ 30 + i^( 4 '^ + A) + i\fc 2 (* “ 0 + : 
Ri = 1+ A^3{v + ^ + ^(" 13 + S~S) 

+ J_ (n _ 67 . 51 \ J_ /31 _ 51\ 
iV^ \ 2n ' 2nV T IVW \ 2 2n) ^ 


5 

2(Nn) 2 


JL\ 

2iWj 


From these expressions the following table is constructed. 
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Fov m > 5 these ratios arc dose to unity even for N as small as 5; hence it 
appears that the chi-square approximation is satisfactory as long as m > 5 
For in < 2 most of these ratios differ considerably fiom unity, particularly 
for the binomial distribution, Since Ri is practically constant, the reduction 
in here indicates that the chi-square approximation will contain too many 
extreme values For the Poisson distribution there is an increase in R A to com¬ 
pensate slightly for this decrease m Ri so that the 5 percent points, for example, 
would not differ very much The use of the chi-squaie approximation would 
therefore tend to give slightly too few significant results when they exist. For 
the binomial distribution, however, there is a decrease in both R-> and R 4 , so 
that the distribution tends toward normality; consequently the chi-square ap¬ 
proximation will contain far too many extreme values and the 5 percent point 
will be much too large. This situation becomes slightly worse with increasing N. 

4. Conclusions. From a consideration of the approximations for the first 
four moments of the distribution of the index of dispersion, it appears that the 
chi-square approximation is highly satisfactory provided that m > 5. For 
smaller values of m, the approximation is still fairly accurate for the Poisson 
distribution but not for the binomial distribution. For decreasing small values 
of m there is an increasing tendency to claim compatibility between data and 
theory when it does not exist; hence the binomial index must be handled care¬ 
fully in such situations These general conclusions are in agreement with the 
specialized results of Cochran and Sukhatme, 

The semi-invariant technique for problems such as this is exceedingly laborious 
and is of questionable accuracy. The coefficients in Fisher’s heavier formulas 
are so large that increased accuracy comes slowly with increased accuracy of 
order of terms. In addition, there are numerous typographical mistakes in 
Fisher's formulas, some of which are not easily detected. The formulas (3) 
may be used to investigate the accuracy of the chi-square approximation for 
situations not covered in the numerical table' but they are of questionable 
accuracy, when m is small, for N as small as 5. 
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ON SERIAL NUMBERS 


By E. J. Gumbel 
New School for Social Research 

In this paper we consider a continuous variate and unclassified observations. 
It is well known that thcie arc two step functions, which we may trace for a given 
series of observations. We will show that the differences between the two ways 
of plotting.play an important r61c for certain graphical methods used by en¬ 
gineers 

To obtain one and only one series of observations we adjust the cumulative 
frequencies. The corrections thus introduced depend upon the theoretical dis¬ 
tribution which is adequate for the observations Later we deal with the rela¬ 
tion between serial numbers and grades. Finally we construct confidence bands 
for the comparison between theory and observations 

1. Theory and observations. If we arrange n observations in order of in¬ 
creasing magnitude, and write each as often as it occurs, there will be a first, 
xi , the smallest value, a second, x 2 , an with, x m the penultimate, x„_i, and the 
last, x n , i.e , the greatest value. The index m is called the observed cumulative 
frequency, or simply the rank It is usual to draw the observations x n along the 
abscissa, and the rank m along the ordinate. The step function starts with a 
vertical line from the value x x of the abscissa to the point with the coordinates 
1, aii, and in general consists of the horizontal lines from the point m, x,„ to the 
point m, x„ +x and the vertical lines from the point rn, x m+ i to the point m + 1, 
x m+1 . The step function ends with the point n, x n . We call this graph the 
step function ( m , x m ). However, another step function which is derived from 
the observations arranged in decreasing magnitude is equally legitimate This 
step function starts from the point with the coordinates 0, X\, and in general 
consists of the horizontal lines from the point m — 1, x w to m — 1, x m '+i and the 
vertical lines from the point m — 1, x m+i to the point m, x m+x and ends with the 
point n — 1, x n We call it the step function (m — 1, x m ). Let F(x) be the 
probability of a value equal to or less than x. Then the continuous theoretical 
curve, the ogive, which we compare to the step functions is nF(x), x. The ques¬ 
tion is whether wc have to use the step function ( m, x m ) or the step function 
(m - 1, x m ). 

The differences between the two ways of plotting aie rarely mentioned in the 
statistical literature. If we plot instead of the rank m the reduced rank m/n, 
the differences between the two ways of. plotting are of the order 1 jn . It is 
generally tacitly assumed that this difference may be neglected. This will not 
hold if n is small. 

- In the following we show two other ways of plotting the observations where 
the differences between the two observed curves play an important role. Sup- 
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pose that the probability Fix) and the density of probability, fix), henceforth 
called the distribution are such that it is possible to introduce a reduced variate 



which has no dimension. In general, the constant a will be a certain mean, and 
the constant b a certain measure of dispersion. Furthermore, the constants may 
be linear functions of these characteristics. Neither the probability G(z) of a 
value equal to or less than z 

(2) G{t) = F(x), 
nor the reduced distribution 

(3) g(z) = bf{a + bz) 

contain constants. The equiprobability test consists in the following procedure: 
We attribute to the mth observation x n the relative frequency m/n, and deter¬ 
mine from a probability table a value z, such that 

(4) G{z) = m/n. 

The variate x is plotted on the ordinate, and the reduced variate z on the ab¬ 
scissa. Then the points x m , z must be situated close to the straight line (1). 
To apply this comparison between theory and observations, we need not even 
calculate the constants. For the normal distribution the application of this test 
is greatly facilitated by the use of probability paper. 

The difficulty is that we may as well choose the frequency 

(4') G(z) = (m - 1 )/n, 

and determine the corresponding values of z. Therefore, we have two lines (1) 
instead of one. The difference between the two series will be large for the 
first and last few observations. For the first series the last observation cannot 
be plotted on probability paper; for the second series the first observation can¬ 
not be plotted. 

The same difficulty exists for the “return period.” If the observations of a 
continuous variate arc made at regular intervals in time which are taken as units, 
we may as in [4] define the theoretical return periods T{x) of a value equal to or 
greater than x as 


(5) 


T(x) = 


1 

I - F{x )' 


The comparison of the theoretical with the observed return periods gives a test 
for the validity of a theory. However, there are two series of observations, 
namely, the exceedance intervals 
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(6) 'T(x m ) = ——— ; m — 1, 2 ■ • • n — 1 

n — m 

and the recurrence intervals 

(7) "T(x m ) = -; m = 1, 2 ••• n. 

n — m + 1 

The two expressions (6) and (7) differ widely for the high ranks. The penulti¬ 
mate observation, for example, has an exceedance interval n, whereas the recur¬ 
rence interval is only n/2. Tins contradiction and the difficulty arising for the 
equiprobability test show that the question of choosing the observed cumulative 
frequency of the mth observation has a practical significance. 

The equiprobability test and the comparison between the observed and the 
theoretical return period may be combined on probability paper. The variate x 
is plotted on the ordinate, the reduced variate y on the abscissa. But instead of 
y we write the probability F(x) and the return period T(x). If the theory holds, 
the observations must be scattered around the straight line (1). 

But all these methods presuppose that we know whether we have to attribute 
to x m the rank m or the rank m — 1. Sometimes a compromise has been pro¬ 
posed which consists in attributing to x m neither m nor m — 1, but the arithmetic 
mean of both, m — In other words, the index m is no longer considered to be 
an integer In such cases, we call m the serial number. 

The corrected frequency m — | may be accepted for the comparison between 
the step function and the probability curve. However, for the return period 
and for the equiprobability test .this method leads to serious difficulties. The 
corrected return periods, which have been proposed by Hazen [7] and have been 
used by M. Kimball [8] are 


( 6 ) 


T(x m ) = 


n 

n — m + 1/2 ’ 


The last among n observations has a return period 2 n. This idea does not seem 
to be sound No statistical device can increase the number of observations 
beyond n. 

2. The adjusted frequency of the mth observation. The use of m, m — 1, or 

m — | as frequency of the mth observations amounts to considering the mth 
value as being fixed. To obtain one and only one step function we consider x m 
as a statistical variate. This will lead to the determination of the most probable 
serial number and of the corresponding probability as a function of m and n. 

The mth observation is such that there arc m — 1 observations below it and 
n — m observations above it. Consequently, the distribution w n {x, m) of the 
mth observation is 


(9) 


w n {x,m) = (j^)m[F(x)r 1 [l - F(x)r~ m f(x). 
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The variate x m is simply called x as each value of x has a certain density of 
probability of being the mth. To distinguish between (x.) and iu„(.r,m), the first 
distribution is referred to as the initial distribution. For some simple initial 
distributions it is possible to calculate exactly the mean and the standard error 
of the distribution (9). This has been done by Karl Pearson [10] for the normal, 
the uniform, the exponential, and other skew distributions. The results are very 
complicated, and do not allow any immediate practical applications 

In the following we determine therefore instead of the mean the mode of the 
mth value. The most probable mth value for which we write x m is tin* solution 
of 


d log w„(x, m ) _ 
dx 


We obtain from (9) 


( 10 ) 


m — 1 
F(£m) 


f(x m ) - 


n- VI ... v _ fjxm) 

1 - F(x m ) J[ m) J(x m ) • 


In this equation m is counted in order of increasing magnitude. If we choose 
the inverse order we obtain the same equation, if we replace the index m by 
n — m + 1. Therefore the following results are independent of the order of 
counting m. 

Equation (10) gives the most probable value x m as a function of m and n. 
The function depends upon the distribution. 

A rough, first trial solution of (10) may be obtained if we confine our interest 
to values where neither m nor n — mis small in comparison to n, that is, values 
which are not extreme. Suppose m to be of the order n/2. For increasing num¬ 
bers of observations, the expression on the left side of (10) become large com¬ 
pared to the right side provided the derivative remains finite, as is generally the 
case. If we neglect the right-hand member, x m is the solution of 

(H) = 

n — 1 

This expression holds for the uniform distribution where/'(x) = 0. 

The following exact solution is valid for any number of observations and any 
serial number Equation (10) will be used in two different ways: First, we sup¬ 
pose' m to be known, we determine the probability F(x m ) of the most probable 
mth value as a function of m and n, and attribute this probability to the mth 
observation x,„ . P.y doing so, the probability of the boost probable mth value 
becomes the adjusted frequency of the mth observation. This leads to one and 
only one series of observations, and settles our initial question. Later, m 
section 3, we suppose F(x m ) to be known, and compute the corresponding most 
probable mth observation. This leads to an estimate of the grades (or partition 
values) from the serial numbers. 
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To obtain F(x m ) from (10) we introduce an expression a (x m ) by stating 

(12) [°{x m )n\ = F(z m )[l - F(x m )]f~ 2 (x m ) . 

The brackets are meant to indicate that the product on the left side does not 
depend upon n We shall show later that is under certain conditions 

the variance of the mth observation. For the present purpose however this 
significance is not lequired. Multiplication of (10) by (12) leads to 

(13) m — 1 + F(x m ) - nF(x m ) = —f'{x m )[o l (x m )n], 


or 

(14) 


_ m - 1 f'(x m ) [a- 2 (z m )n] 

n — 1 n — 1 


The adjusted frequency m (14) is similar to (11). Another expression for the 
adjusted frequency, derived from (13), is 

(15) F(x m ) = i + - (F(x m ) - | + /'(S m )[o’(* m )»l). 

n n 

The adjusted frequency is the compromise —- 2 plus an expression 

n 


(16) 


- = l ~ i -Y f'{x m )[a(x m )n\). 

n n 


The correction, D, defined by (16) depends upon the initial distribution and has 
no dimension. In general, it will depend upon the constants which exist in the 
distribution. If the distribution f{x) may be written in a reduced form (3), 
the correction 1 

(17) D = GOO - i + Q'{z)W{z)n] 


depends only upon the dimensionless reduced variate z. For a given initial dis¬ 
tribution we choose numerical values for the probability G{z) — F(x m ) calculate 
g'(z) and 


(18) 


^. awg-aW) . 


From (16) we compute a table for the corrections D as a function of the adjusted 
frequencies F(x m ) and obtain for given n the serial number m as a function of 
the adjusted frequencies by 


(19) m = nF(x m ) 4- | — D. 

These serial numbers will not be integers. The adjusted frequency F(x m ) for 
the mth observation will then be obtained by linear interpolation. 


1 In previous articles [3, 6] we started from another interpretation of the corrected 
frequencies and obtained slightly different corrections. 
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The value and the sign of the correction D depends upon the distribution. 
For the asymmetrical exponential distribution, for example, the correction 

(19') D = -h 

is independent of the variate. This means that we have to use exclusively the 
step function ( in - 1, x m ) as being the best way of plotting. The observed 
adjusted,return periods are the recurrence intervals. 

For a symmetrical reduced distribution we have 

(20) 1 - 0(-z) = G(z); g(-z) = g(z); g'{-z ) = - g'(z ). 

Therefore, the reduced correction will be 

(21) D(-z) = -D(z). 

For the two reduced values z and —z of a symmetrical variate the corrections 
have the same size, but different signs. 

A relation similar to (21) holds for two asymmetrical reduced distributions 
Qi{z) and j g(z), which are symmetrical one to another in the sense 

(22) G x (z) = 1 - i(?(-z), gi(z) = i?(-z); g[(z) = -i g'{-z). 

Then, the corrections are 

(23) A(-2) = -i D(z). 

For any initial distribution/(x) we read from (19) the adjusted frequency 


(24) 




TO — i + D 


even for a small number of observations. The question whether to choose m/n 
or (to — l)/n as observed cumulative frequency is settled by (24). We obtain 
one observed step function, one series for the equiprobability test, and one 
series of observed return periods 


(25) 


T(x m ) 


n 

n — to + J — Z>’ 


which have to be compared to the theoretical continuous curves. 


3. Estimates for the grades. In the following we use the fundamental 
formula (15) to determine interesting grades through the mth values. 

We use the term grade for the value of a statistical variate which corresponds 
to a given cumulative probability F(x ) say, F(x) = J; £; J for quartiles; F(x) = 
tVi ■ ‘ - A f°r deciles, and so on. For a given grade, the probability F{x) the 
density of probability f(x) and its derivative are known, and to is unknown. 
The value of to obtained from (15), henceforth called the most probable serial 
number to, is the solution of 
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(26) m = nF(x) + 1 - F(x) - f'(x)F(x)( 1 - F(x))r\x). 

The corresponding “observed” value is obtained by interpolation between 
two observed values a; m _i and x m , such that 

m — 1 < m < m. 


For the median, x 0 , the most probable serial number is 


(27) 


mo 


n + 1 
2 


fix q) 

4 f(xo )' 


The median x 0 itself enters into (27). It has to be eliminated through the condi¬ 
tion F(x 0 ) — \. For the exponential distribution for example we find 


(27') *.-5 + 1. 

The most probable serial number of the median for a symmetrical distribution is 
(28) mo = |(n + 1). 


This is the usual estimate of the median for any distribution. The estimate 
obtained from (27) is smaller (larger) than the usual estimate if the median is 
smaller (larger) than the mode The difference between the two estimates is 
due to the fact, that (27) makes use of information about the theoretical distribu¬ 
tion whereas this information (if available) is neglected by the usual method 
For symmetrical distributions the most probable serial numbers mi and m 2 
for two symmetrical grades defined by Fi and F 2 = 1 — Fi are according to 
(26) related by 

mi = nF t + 1 - (Fi + 0,(1 - F^/T 2 ) 

(29) 

fib = n(l - Fj) + (Fi + 0,(1 - F,)0). 

The members in brackets have the same size, but opposite signs. Another ex¬ 
pression for fib is 

fib = (n. + 1) — [nFi + 1 — Fi — JiFi(l — Fi)/T 2 ] 
so that, for symmetrical distributions 


(30) 


fib + fib = n + 1. 


This is to be expected as the mth value counted upwards is the (n — m + l)st 
value counted downwards. • 

For the two quartiles q, and q, the most probable serial numbers rh(q,) and 
m(qz), obtained from (29) are 


f31) 


m(?i) = 


n + 3 _ 3 /'(gi). 
4 16/*(?,) ’ 


m(qi) 


3n + 1 _ 3 f(g 2 ) 

4 


where gi and g 2 have to be eliminated by the use of 
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f(aO = i; F(W = l 

For the uniform, the normal and the exponential distribution we obtain the two 
quartiles from 



i n + 3 

m(q 0 4 ! 

ra, s 3n + 1 
fn(qt) ~ 4 

(31') 

m( qi ) = 1 + .352; 

7h(g 2 ) = ~ + .648 


m(qi) = | + 1 l 

■ fo(qi) = ^ + 1 


respectively. The last result may also be found from (19') and (24). These 
estimates differ from the usual estimates by the reason given above. 

. We now apply the notion of a grade to certain characteristics which are other¬ 
wise defined. A certain characteristic, say, the mode x or the mean % have for a 
given distribution the probabilities Fix) or Fix) respectively. These probabili¬ 
ties may be used to define a grade. We determine the corresponding mth value 
from (26), and obtain an estimate of the mode or the mean, interpreted as grades 
by interpolation between the observed mth values For a symmetrical dis¬ 
tribution these estimates of the mode and mean are identical with the estimates 
of the median. For an asymmetrical distribution, the most probable serial 
number fn.(x) of the mode becomes according to (26) 

(32) ■m{x) = (n — 1 )F(x) -f- 1. 

Usually, the mode x of a continuous variate is estimated by another procedure. 
The observations are arranged in certain cells. One of them has the largest 
relative frequency It will contain the mode. To find its position within the 
cell, an interpolation formula is applied which reproduces the content of this 
cell and of the two adjacent cells. By choosing different lengths for the cells 
and different origins for the classification, the mode can be shifted to the right or 
to the left. Formula (32) furnishes a determination of the mode from the ob¬ 
servations according to the theory, such that the arrangement of the observa¬ 
tions into different cells is not needed. Of course, this method can be applied 
only if we know the theoretical distribution /(x). The problem how to estimate 
the mode is important for distributions where one of the constants may be in¬ 
terpreted as the mode or as a function of the mode [1, 4]. 

4. Standard errors of the estimates. The numerical work involved in the 
method (26) of estimating the grades is very small. To obtain the standard 
errors of these estimates we consider the asymptotic properties of the distribu¬ 
tion (9). The following results hold therefore only for large numbers of observa¬ 
tion. Besides we assume, that the serial number m is of the order n/2, i.e. not 
extreme. It has been shown [2] that under these conditions the distribution 
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of the mth value converges toward a normal distribution with a standard error 
where 


(33) <r(x m )Vn = VF(x)( 1 - F(x)). 

Although this standard error does not contain m explicitly, it has a clear meaning 
for any value of x as we know from (26), which observation we have to attribute 
to the probability F(x). The classical proof about the approximate normality 
of the distribution of the median in large samples is a special case of this con¬ 
vergence and the classical standard error of the median, 


(34) *(*. Wn = 

is a special case of (33). The square root in (33) is maximum for F(x) = 
Therefore, 

(35) v /n = ‘ 

If the variate x may be reduced through the linear transformation (1) the 
standard error <r(z) of the reduced variate, called reduced standard error 


(36) [<r(z)Vn] = ^ V G(z)(l - G(z)), 

may be calculated as a function of z where z corresponds to x m . To call at¬ 
tention to the fact that these numerical values do not depend upon n, they are 
written in brackets. The standard error of the estimate for x m is, according to 
(2) and (3) 


(37) 


<r(Zm) - -4= [<X(Z)V »]. 

Vn 


Since the constant b is a measuie of dispersion, the standard error of the estimate 
of the mth value is proportional to the standard deviation of the variate. The 
factors b and 1 /\/n show that the standard error of the mth value is of the same 
structure as the standard error of the arithmetic mean. 

For symmetrical distributions the standard error (33) of the mth value is also 
a symmetrical function The standard errors of the estimate of the two quar- 
tiles, and generally of the estimates of two grades defined by F and 1 - F, are 
then identical If the mode coincides with the median, the corresponding stand¬ 
ard error of the mth value is a minimum. For a symmetrical 17-shaped distribu¬ 
tion, however, where the density of probability passes through a minimum at 
the .center of symmetry, the median has the largest standard error among the 
with values. An example for such a distribution has been given by Leavens [9], 
As the distribution of the mth value converges towards a normal distribution, 
it is legitimate to attribute to the mode of the mth value the standard error (33) 
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Therefore, for a large number of observations (33) gives the standard error of 
our estimate of the grades. The standard errors of the estimates (31) of the 
quartiles are 

(38) aiqWn = ; *(@) Vn = ^ . 

The arithmetic mean in its usual definition is not an mth value. Its standard 
error <r(x), where 

(39) ' a{x)s/n = v, 

will, therefore, fall outside of the range of the standard errors of the mth values. 
(See graph 1.) If the distribution f(x) is such that the standard deviation 
does not exist, it is legitimate to estimate the arithmetic mean as a grade, and 
calculate it from the corresponding most probable mth value by introducing 
F(x),/(x) and/'(x) into (26). The standard evroi of the anthmetic mean inter¬ 
preted as a grade is 

(40) a(£)Vn - j~ VF(x)(l - Fiif). 

If we use this estimate of the arithmetic mean for distributions where a exists, 
the usual determination of the mean will be more (less) precise than its estimate 
as a grade if 

(41) cf(x) $ VF(x)(F-F(xT). 

The standard error of the mode estimated as a grade is 

(42) a{xWn = ~ VFWX ~ F{x)). 

As the standard error of any characteristic depends upon the way it is estimated 
from the observations, the standard errors of the mode or mean interpreted as 
grades differ from the usual standard errors. 


6. The most precise giade. Equation (33) may be used to define a new grade 
which has interesting properties. The standard error (33) of the estimate of the 
mth value is a function of F We ask whether it possesses a minimum (maxi¬ 
mum). The corresponding value of the variate, x, may be called the most 


(least) precise mih value or the most (least) precise giade. 
sufficient to calculate from (33) 


To obtain it is 

dF 


nd log a (x m ) _ 2ntd(xj 
! dx a(x m ) 

Therefore the most (least) precise grade is the solution of 



- /<») _ ? fix) _ 

F(x) 1 - F(x) f(x) 
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This expression does not vanish if either F{x) = \ or f'{x) — 0. It vanishes if 
both conditions hold simultaneously. For a symmetrical distribution passing 
through a mode (minimum), the mode (minimum), estimated as a grade, is the 
most (least) precise grade. Equation (43) may be written 

mr\x)F(x)( 1 - Fix)) = § - F(x). 

If we introduce this expression into (16), we obtain D = 0 
and 


(44) 


F(*) = 



The most precise tilth valve is such that the adjusted frequency is the arithmetic mean 
of the frequencies m/n and (m — l)/n. 

The most precise mth value x cannot be calculated from the observations 
alone. It may be estimated in the same way as any grade by introducing the 
values F(x), fix) and fix) into equation (26). 

To show the difference between the most precise grade and the mode we apply 
the procedure developed above to a skew distribution. The reduced distribu¬ 
tion of the largest value giy) and the probability G(y) aie 

(45) giy) = e^Giy)-, Giy) = <T‘~\ 


The relation (1) between the reduced variate for which we write y instead of z 
and the largest value x is 

(46) x = u + - . 

a 


where u = x is the mode and 
(47) 


V6 


<T. 


The most piobable serial number m(u) of the mode, obtained from (32) is 

(48) fh(u) = n 6 -- . 

e 

This equation may be used for an estimate of the constant u. 

The reduced variance v^iy) obtained from (36) and (45) is 

(49) iAvWn) = e 2 V~* - D- 

A table for the reduced standard error <r(y )\/n has been given in a previous 
publication [6], The value aiy)\/n is plotted in figure 1 for probabilities Giy) 
from 0.01 to 0.95. The standard error has a minimum for a value of y located 
to the left of the mode y = 0. On the same graph are plotted the reduced 
standard errors for the normal distribution. As the normal reduced variate z 
differs from the reduced variate y, two diffeient scales are used for the variates. 
The standard error of the estimate (48) of the mode interpreted as a grade, 
obtained by introducing y = 0 into (49) is 
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( 49 ') ir(u)\/n ~ ~ 's/e — 1 == 1.02205cr. 

The most precise grade is 



CX 



where y is the value of the reduced variate, for which the standard error (49) 
is minimum. We obtain from (49) and (45) the numerical values 

(50) y = -.46601; G(y) = .20319; = .96887<r 

The standard error of the most precise grade is 3 per cent smaller; the standard 
error of the mode, estimated as a grade, is 2 per cent larger than the standard 
error of the mean. 
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6. Confidence bands. The standard errors (33) of the grades may be used m 
a general way for the construction of confidence bands obtained from curves which 
control the fit between theory and observation. Consider first the observed 
stepfunction (m — x m ) and the theoretical ogive nF(x), x. The vanate x is 
plotted along the abscissa, the cumulative frequency along the ordinate Now, 
for large n any theoretical value x, which is not extreme, may be interpreted as 
an with value having a normal distribution and a standard error a{x m ) At each 
point of the graph of nF{x), x which is not extreme, we construct a segment of 
length 2 <r(x m ) parallel to the x axis, the midpoint of the segment being on the 
theoretical ogive. In other woids, we add the standard error <r(x m ) to, and sub¬ 
tract it from, any corresponding value x, and attribute nF(x ) to the beginning 
and end of these intervals. By this procediue we obtain two curves nF(x), 
x T <r(x m ) For each observation there exists a probability P = 68268 that it 
will be contained within the interval x =F a(x m ). 

If we apply another hypothesis to the same observations, or choose other 
values for the constants, we reach, of course, other control curves Of two com¬ 
peting hypotheses the one is to be prefened where the band contains a larger 
number of observations 

The same method may be applied to the equiprobability test and to the com¬ 
parison of the observed and theoretical return periods [6], This proceduie is 
legitimate foi all values which are not extreme. 

In the following, we construct the confidence bands for the normal distribution 

(sc m - ±,-. 

The variate x is related to the reduced variate z by (1), which, in this case, be¬ 
comes 

(52) x = x + ir\/2z 
The probability G(z) is 

(53) G(z) = Hi + *(*)), 
where F(z) stands for the Gaussian integral 

(54) $(z) = —p- f e~ li dt. 

Vir 

Formulae (36) and (53) lead to the reduced standard error 

(55) c-(z)Vn = Vl — # 2 (z), 

given in the table, col 6. The standard errors o(x m ) of the mth values obtained 
from (37) (52) and (55) are 

<r(x m ) = [tr(z)y/n]. 

V n 


(56) 
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As a numerical example we choose the annual precipitations observed in 51 
meteorological stations m Paris and its surroundings in the year 1938. We 
suppose that the differences between the 51 obseivations are only due to chance 
The stepfunction m — i, x m is plotted in figure 2. To obtain the theoretical 
ogive we compute the constants in (52). They arc 

(57) x = 571.92; u\/2 ~ 38.52. 

The theoretical values x obtained from (52), the cumulative frequencies nF(x) 
obtained from the table of the Gaussian integral [11] and the standard errors 



(58) <r(x m ) = 6.393 [<r(a)Vn], 

obtained from (56) are given in the columns 2 to 5 and 7 of Table I. 

We trace in figure 2 the theoretical curve nF{x), x and the confidence band 
obtained from col. 7. by the methods described above. All observations are 
contained within the control curves. We may accept the theory that the differ¬ 
ences between the annual rainfalls observed in the 51 stations are only due to 
chance. 

7. Conclusions. To test a statistical hypothesis for a continuous variate we 
use the ogive, the equiprobabihty method, based on (1), and the return periods 
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(5). The three tests may be combined on appropriate probability paper. As 
the rank of the mth observation x,„ may be m or m — 1, we have two series of 
observations To obtain one and only one series we use for the ogive the serial 
number m — ■§ provided that the number of observations is large. Generally, 
we attribute to x m an adjusted frequency, namely, the probability (15) of the 
most probable mth value. The adjusted frequency is obtained from the serial 
number m — -§ and a correction, D, equation (17), which depends upon the dis¬ 
tribution. The correction is important for the three tests, and small n, further¬ 
more, for the equiprobability test and the return periods for the extreme observa¬ 
tions and any number n. 

The same correction D is used for estimating a grade through its relation (26) 
to the corresponding most probable serial number m. For distributions, where 
the second moment does not exist, we estimate the arithmetic mean from a 

TABLE I 


Normal Confidence Band and Theoretical Frequencies of the Rainfalls 


Reduced 
Variate 
± z 

1 

Variate 

Frequency 

1 Reduced 

Standard Error j 
1 o- (z) 

6 

Standard Error 
’ (*m) 

7 

X 

2 

X 

3 

51 F (x) 

4 

' 51 F (*) 

0 

571.91 

571.9 

25.50 

25.50 

.886 

4.8 

.2 

564.2 

579.6 

19.82 

31.18 

.899 

4.9 

.4 

556.5 

587.3 

14.58 

36 42 

.940 

5.1 

.6 

548.8 

595 0 

10.10 

40.90 

1.012 

5.5 

.8 

541.0 

602.7 

6.58 

44.42 

1.127 

6.1 

1.0 

533.4 

610.4 

4 01 

46.99 

1.297 

7 0 

1.2 

525.7 

618.1 

2.29 

48.71 



1.4 

418.0 

025.9 

1.22 

49 78 



1.6 

510 3 

633.6 

60 

50.40 



1.8 

502.6 

641.3 

.28 

50.72 




grade. For asymmetrical distributions we estimate the mode from a grade 
by (32) and (48). 

In this case, we have to introduce a distinction between the mode and the most 
precise grade (43). The adjusted frequency and the estimates for grades may 
be used even for small numbers of observations 

The standard error of these estimates is obtained, equation (33) from the 
limiting, normal, form of the distribution of the with value, which holds, provided 
the serial number is not extieme. To control a hypothesis we construct con¬ 
fidence bands, which are obtained from the standard errors of the grades. 
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FITTING GENERAL GRAM-CHARLIER SERIES 

By Paul A, Samuelson 
Massachusetts Institute of Technology 

1. Introduction. Since the last part of the nineteenth century at least, it has 
been common to represent a probability distribution by means of a linear sum 
of terms consisting of a parent function and its successive derivatives. Usually 
the parent function is the Type A or normal cuive, as discussed by Gram [1], 
Bruns [2], Charlier [3], and numerous others. In addition there have been 
generalizations in various directions: for example, the Type B expansion in terms 
of the Poisson parent function and its successive finite differences. 

Unlike these two types, which have a definite probability interpretation, 
another generalization involves the use of other parent functions and their 
derivatives (or differences) to give an approximate representation of a given 
frequency curve. With this process is associated the names of Charlier, Carver 
[4], Roa [5], and many otheis. Two general methods by which the equating of 
moments of the fitted curve and the given distribution yield the appropriate 
coefficients have been given by Charlier and Carver respectively. An account 
of the latter’s technique is more accessible to the average English speaking 
statistician 

It is the purpose of the present discussion to indicate how the Charlier method 
may be simplified, and can be used to replace the Carver method. In doing 
so, I am following up the oral suggestion made some years ago by Professor 
E. B. Wilson of Harvard, that repeated integration by parts will yield the req¬ 
uisite coefficients very simply. At the same time certain methods implicit m 
the work of Dr A C. Aitken [ 6 ] show how the use of a moment generating 
function can often lighten the algebraic analysis. There will also be a brief 
indication of analogous results for general finite difference parent families; and 
attention will be called to a troublesome historical blunder which has per¬ 
meated the statistical literature. 

2 , Alternative methods. Avoiding the overburdened 'expression generating 
function, I shall consider parent functions, called f{x), with the restrictive 
properties: 

a) Moments of all order of f(x) exist. 

b) Derivatives of any required order exist with appropriate continuity. 

c) There exist high order contact at the extremities of the distribution as 
defined below. 

Mathematically, 

a) £ x k f(x')dx is finite for all positive integral values of k 
and 

c) lim x 3 f(x) = 0 for all positive integral values of j and k. 

X-*±K> 
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These conditions suffice for many statistically interesting cases, but where de¬ 
sirable they can be lightened, Thus, derivatives may only be defined “almost 
everywhere,” and there may be finite instead of infinite limits to the distribu¬ 
tion, etc. 

Given an arbitrary frequency curve F(x), we shall suppose it to be formally 
expanded in the senes 

(1) F(x) ~ a a f(x) + aj'ix) + o*T(:r) + • - • + a n f\x) H-. 

For convenience in what follows, we shall assume that all distributions are given 
in terms of relative frequency so that the area under both / and F is equal to 
unity, so that a 0 may be taken as unity. The suppressed absolute frequency 
can clearly be restored at any time by multiplication of both sides with the 
appropriate constant. Also for algebraic convenience, many writers consider 
the slightly modified form of the expansion 

m ~ a, f{x) - jjf(x) + ^r(x) + • • • t^~- n r(x) + • • ■. 

It is assumed without discussion that the first n coefficients in such a series are 
to be determined by equating the first n moments of each side. 

I shall prove the two following identities: 

(2) (- l)*o. = L n {F) - Z Ln~i(f)(- lYa ,, 

n 

where 


Uf) = 


f x’f{x)dx 
J—00 


Alternatively 



The first of these which I owe to Piof. Wilson is implicit in Charlier’s work. 
The second which may fairly be attributed .to Aitken may reduce the actual 
work in many special cases met in practice, 

Both of these methods are closely related to the Charlier device of finding 
polynomials $„(»:) with the bi-orthogonal property 


£ S n (x)f(x) dx = 0, 


i n. 


The subscript indicates the degree of the polynomial. By means of n of the 
above relationships, the polynomials can be determined except for a factor of 
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proportionality. By formal integration of both sides of our expansion we have 
the Charlier identity 

a n = / S n (x)F(x) dx /factor of proportionality. 

J—00 

From a theoretical standpoint, this method leaves little to be desired; but in 
practice the algebraic work increases rapidly with the number of terms to be 
included in the series. 

In the Carver method, the new parent function in question, as well as the 
function to be approximated, are both expanded in terms of the normal curve, 
thus almost doubling the numerical calculations After some differentiation, 
the members of the Type A family are eliminated yielding in the process the 
required coefficients in terms of the new parent family. We shall see below how 
this method may be related to the three above. 

3. Useful relationship. First, two simple identities may be presented: 

LAD = (-1 ) l L,-<(/), 3 ^ i 

= 0 , J < i. 

Given the above assumptions of high contact, this follows immediately from 
repeated integration by parts. 

Remembering that the reduced moments defined just above are the coefficients 
of the powers of a in the series expansion of the moment generating function 

M(a-J') = f e ax f(x) dx = Lo(f') + h(f)« + AA/V + • ■ • 

*—oO 

we have the useful Aitken identity 

( 4 ) M(a;f) = <—/)«*. 

This, too, is the immediate consequence of repeated integration by parts. 

4. Derivation of first method. Formally multiplying each side of (1) by 
,r n /n' and integrating, we have the formal identity 

L n (F ) = a,L n (J) — aiL„_i(/) + ■■• + (-l) n a„L 0 (/) . 

This is a “triangular" system of linear equations in the unknown o’s. It may 
be written in matrix terms 

To (F)] rLo(/) 0 0 • ■ ■ 1 r Oo' 

Li(F) Li{f) L 0 (f) 0 - 0l 

• = h(f) LAS) LoO) ■ • ■ 02 

■ • • ■ — as 



182 


PAUL A. SAMUELSON 


The triangular matrix has the very special property that all of its elements arc 
known as soon as the first column is given. For this reason, as we shall see, it 
is essentially equivalent to a simple sequence of numbers. This we shall call 
the sequence property. Because of this special form, the above s vs lorn by simple 
rearrangement may be written in the modified form 


\U (F) 
Li (F) 

© 

3 

= 

\U(f) 

LiV) 

0 

UU) • ■ 


cio 

— Oi 

Ctz 

0 0 • • ■ 

flo 0 • • 

— a* (to ■ ■ 

, 




» 



, 


By appropriate definition of symbolism, this may be written in the simple matrix 
form: 

L(F) = L(/) a(F,f), 

since multiplication of two triangular, “sequence” matiices is commutative. 

It is usually simplest to invert this triangular solution directly as in (2). 
But if necessary, we may express our answer m the equivalent form 

(5) a(F,f) = L(F) 

where the inverse to any special triangular matrix, also possesses the sequence 
property. 

If g is a second parent function with the properties of Section 2, we have the • 
relationship 

*(F, g) = a(F,f) a(f, g) 

which follows directly from (5). This may be generalized to 
«(/i. U) a(fi , /a) ■■■ »{/ n -i, f n ) = a(/i, /«) . 

If F itself is a parent function, we have 

a(.F, f) a(f, F) = a(F, F) = 1 


or 

<*(/, f) = o(f, /r 1 • 

6. Relation to old methods. In terms of our notation, the Carver method 
seems to reduce to computing o(F, /) by the relationship 

a (F , /) = alF, +) a(f, </>) -1 

1 ' where 4> is the Type A parent function. It involves a doubling of the work of 

coefficient determination. However, if only a few terms in the expansion are 
retained, this is of negligible importance, 




GRAM-CHARLIER SERIES 


183 


The Charlier polynomials are clearly summed rows of the matrix product 


L(f)- 1 - 


fl/1! 

0 

0 


0 

x/U 

0 


0 

0 

m 2 /2! • • • 


To know the first n of these polynomials, it is not necessary to derive 
n(n + l)/2 different coefficients. Because of the sequence property, it is only 
necessary to derive n elements of the first column of L(/) -1 These can be 
expressed m terms of the reduced moments of /, as did Charlier; but the rela¬ 
tionships are non-linear and algebraically become tedious for high n. They are 
better computed from sequence relationships. 

The above discussion suggests that the bi-orthogonal relationship between a 
parent family and suitable polynomials has no deep significance. In particular, 
there is no essential relationship to least squares as in orthogonal expansions. 
It does, however, share one important property with orthogonal functions— 
determination of later coefficients does not affect the earlier ones. But this is a 
propei ty of all triangular reductions, orthogonal expansions being only special 
cases of these. 

6. Sequence properties. Ordinarily to derive the inverse of an n matrix, 
n 1 equations must be solved For our triangular matrices, we need only solve n 
equations for one column. To each triangular matrix L{j) there corresponds a 
sequence (Ia(/)}, which is in fact the first column of the former. Similarly to 
L(/) _1 , there corresponds {L k {J) I ; the elements of the latter are defined by the n 
equations 

L*(f)Uf) = I 
Lo(f)Uf) + LiWUf) = 0 


Z L k (f)Z n - k (f) = 0 

o 

But these are precisely the equations involved in the formal inversion of any 
linear operator system of the form 

(G) E c * h k y = 2 

0 

where h is an operator which commutes with a constant, and for which h° = 1. 
s is a known function and y unknown. Thus h may be such operators as 

x, d/dx, xd/dx , E, A. 
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A particular solution of (6) is given by the formal expansion 

o 0 

y = 1 l Si h’z 
0 

where the c’s bear the same relationship to the c’s as do the L'a to the L’s. 

Such “reciprocal" sequences appear in many branches of applied mathe¬ 
matics. In particular, they arise in the inversion of a power series. If formally, 

0 

then 

w&j = 

Thus, to any triangular matrix with the sequence property, we can formally 
associate a function W(a) as well as a sequence of numbers. The calculus of 
multiplication of our ‘triangular matrices clearly "corresponds" to the calculus 
of multiplication of functions, i.e if the triangular matrices T \, Tt , ■ • T n and 
IFi(a), WsfoOi - ■ TTn(«) correspond, and T n — IV TV ■ ; then 

W„(«) = ^(a)F,W ••• WWot). 

Also, 1 /IT,(q!) coiresponds to TV 1 . 

7. Moment generating functions. If only for the above reasons and no 
others, we should be tempted to consider the function formally defined by 

t Uf)a k . 

0 

But this is precisely the expression for the familiar moment generating function, 
m. g, f. 

M(a-J) = I'm*)* - E !*(/)«. 

v— CO 0 

In this way, the method of triangular matrices joins the method used by Aitken 
for the Type A family. If 

~ 11 aif'{x), 

0 

and we formally equate moment generating functions of each side, we get 

(7) M(«;F) = M(crJ) £ (-l)’**’, 

0 

by means of the Aitken identity (4). Thus (—l)’a, equals the coefficient of a* 
in the formal expansion of 
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= M( a ;F)M(a;f)-\ 

Our relationship (2) follows immediately from (7); and by Taylor’s expansion in 
a of the identity (3) is quickly realized 

For many problems, the reciprocal of the m g. f. of f(x) is itself a simple func¬ 
tion; to that our triangular equations may be inverted without solving linear 
equations. Thus where F(x) = f(x + b), we immediately verify Taylor’s expan¬ 
sion by use of familiar properties of the m. g f. under shift of origin. 

8 . Finite difference expansions. Corresponding to integration by parts, we 
have the formula 

Z = (-1) Z ATT, = (-1) 2 Z A 2 TT,V*~ 2 F,, etc., 

— 00 — OO —00 

provided “high contact” properties are assumed. V and A are receding and 
advancing differences respectively. Recalling the familiar propeity of “reduced 
factorial” polynomials, ’‘x , we have 

Z = (- D* z ’“**/(*) ' • 

—to —OO 

= 0 j < k, 

or 

Q;(v7) = (-1 ) k Q^(f) j £ k 

= 0 3 < k, 

where 

«.(») - ± + Q 5(I) . 

-» 3- 

In the expansion 

F(x) ~ Oaf(x) + o : V/(x) + o 2 V 2 /(x) + • ■ ■ , 

the a’s obey laws identical to (2) and (3) where reduced factorial moments are 
substituted for the reduced L moments, and the f. m. g. f. 

Z/(*)(! + «)*, 

—oo 

for the ordinary m. g. f. 

9. Convergence. All of the above relationships are purely formal, without 
regard to convergence. The last is a difficult subject, and little discussed in the 
statistical literature, since applications of G-C series have been almost entirely 
concerned with empirical frequency curve fitting in which mathematical con- 
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vergence does not enter. Actually in the scanty treatments of the subject there 
has arisen a confusion between the Type A G-C expansion, which equates 
moments, and the expansion of a function in orthogonal Hcrmite functions. 
These are not unrelated, but nevertheless they are distinct. This is well recog¬ 
nized in the purely mathematical literature, but hardlv at all in the literature of 
statistics and physics. 

The series differ by an irremovable factor of 2. If the Type A functions are 
written as 

H t (x)e~ x \ 

then the Hermite functions will take the form 

H,(x)e~ ix \ 

where the H’s are Hermite polynomials suitably normalized, Unfortunately 
the G-C series often diverges when the H series converges Thus, the statistically 
interesting Cauchy distribution can be expanded in an H series; but since it 
possesses no finite higher moments, the G-C series cannot even be defined. 

It is not hard to show that the G-C expansion of F in terms of a Type A func¬ 
tion/(a;), is equivalent to an H expansion of F/~* in terms of the H family/*. 
It is sufficient for convergence in the mean of the last expansion that F/~* be of 
integrable square or belong to Zj 2 . This means that the G-C type A expansion 
will be valid if Ff '* is well behaved, not simply if F is well behaved. For F a 
histogram as is often the case in practise, no difficulties of convergence arise, 
•although rapid convergence, may be another matter Nevertheless, many well 
behaved F's will not pass the more strict test. The leader is referred to the last 
five titles in the bibliography for mathematical discussions of this problem. 

The above discussion holds only for the Type A expansion. There remains the 
very difficult problem of convergence conditions in the more general case. No 
immediate generalization suggests itself, except the application of the results of 
the “moment problem ” However, this must be handled with delicacy, since 
the partial sums of the series may actually become negative over some range. 
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A METHOD OF TESTING THE HYPOTHESIS THAT TWO SAMPLES 
ARE FROM THE SAME POPULATION 

By Harold C. Mathisen 
Princeton University 

1. Introduction. There are many cases in testing whether two samples are 
from the same population m which no assumption about the distribution func¬ 
tion of the population can be made except that it is continuous. A. Wald and 
J. Wolfowitz, [1], have developed a method of testing the hypothesis that two 
samples come from the same population based on certain kinds of luns of the 
elements from each sample in the combined ordered sample. W J. Dixon, [2], 
has introduced a criterion for testing the same hypothesis based on the number 
of elements of the second sample falling between each successive pair of ordered 
values in the first sample. 

The problem considered here is that of devising a simple method of testing 
the hypothesis that two samples come from the same population, based on 
medians and quartiles, given only that the distribution function of the popula¬ 
tion is continuous. The simplest method may be described briefly as follows. 
We observe the number of elements, mi, in the second sample whose values are 
lower than the median of the first sample. Since the distribution of mi is inde¬ 
pendent of the population distribution, we are able to compute significance 
points from the distribution of nil . These points may then be used for testing 
the hypothesis at a given significance level. This will be referred to as the case 
of two intervals. 

This method may be easily extended to the ease of any number of intervals. 
In this note we shall consider the extension to four intervals by using the median 
and the two quartiles of the first sample to establish four intervals into which 
the elements of the second sample may fall. Then, if the second sample is of 
size 4m, it will be shown that, under the hypothesis that the two samples come 
from the same population, J of the second sample, or m elements will be expected 
to fall in each interval. Let the number in the second sample which actually 
fall in each interval be mi, m 2 , m 3 , and rrii respectively. The test function 
here proposed is, 

/i\ n _ (mi — m) 2 + (ma — m) 2 + (mj — m) 2 + (m« — m) 2 

. 9 m 1 

where 9m 2 is a constant, which forces C to lie on the interval 0 to 1. If the m<, 
(f = 1, 2, 3, 4), have values quite different from their expected value m, it is 
apparent that C will be large. Therefore the greater the value of C the more 
doubtful is the hypothesis that the two samples come from the same population. 
Significance values of C will be computed for several sample sizes. The ques¬ 
tion of whether C is the “best” four-interval criterion for testing the hypothesis 
that two samples come from the same continuous distribution is an open one 
, 188 
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which would depend for its answer on an extensive power function analysis. 
We shall not go into this analysis, however, but shall use C on intuitive grounds. 
This case will be referred to as the case ,of four intervals. The extension of the 
method of the case of four intervals to any number of intervals presents no new 
difficulties in derivation, however we shall confine our attention to the cases of 
two and four intervals 


2. The case of two intervals. Suppose fix) is a continuous distribution func¬ 
tion with probability element/(x) dx. Let us draw a sample of size 2n + 1 from 
a population having this probability element. Let the elements in the sample 
be xi , x 2 , ■ • , x 2n+ i ordered from least to greatest The median of this sample 
will be x n+ i. Now consider a second sample of size 2m, and let mi be the num¬ 
ber of observations, whose values are less than x„ + i . We call m 2 = 2m — mi 
the number of elements in the second sample greater than x n +i 

/ *n+1 

fix) dx be the probability of an observation having a value less 

than x^i . Then the probability of an element having a value greater than 
x n +i is (1 — p). Thus we have the relation f(x n+i ) dx n +\ = dp. The probability 
law of the median, x n+ i given by the multinomial law 1 is 

<*> ™ ~ vmsr p ‘. (1 - »)• **■ 

The conditional probability law of mi, given x n+ i, is then 

From this it follows that the joint probability law of x n+f . and mi is the product 
of (2) and (3) or 

(4) P r (mi , Xn+l) = _ *)■+*—“ dp. 

n\n\mi\2m — mi)! 

We may integrate (4) with respect to p from 0 to 1 as a Beta Function, leavmg 
the distribution function of mi independent of the population probability ele¬ 
ment f(x) dx. We get for the distribution of mi, 


1 The multinomial law may be stated briefly as follows. 

If a trial results in one and only one of the mutually exclusive events E\ 
the probability P that in a total of n trials, m will result in Ei , n 2 in Ei , 

(x>=« 


\ y is given by 


nl 




-_ pW 


Pk* 


Et , ■ ■■ , Ei , 
■ • , n* in Ei , 


where Pi , Pi , ■ • • , Pt 
Si.lii". , Ek respectively. 




, are the probabilities of a single trial resulting in 
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z'5', p / v _ (2n + 1)1 (2m)! ( n + rn x ) \(n + 2m - mi) < 

' J f n In! mi! (2m — mi)!(2n + 1 + 2m)! 

From (5) a simple recursion relation between P,(m0 and P T (m x -f 1) may be 
determined from which the probabilities of various values of m may be rapidly 
computed. For large samples it can be shown that under certain regularity 
conditions, the ratio, [mi — E(mi)]/cr mi maybe approximated by the normal distri¬ 
bution 2 with zero mean and unit variance. The derivation is similar to that of 
the four-interval case, which is taken up in greater detail. It will be found by 
the use of (4) that the expected value of mi is m, and the variance of m x is m + 
m(2m - 1 )(n + 2) 2 TT . . . , 

- 3 - m . Using this information, values of mi for various 


TABLE I 


The Case of Two Intervals 

Lower and upper .01 and .05 percentage points for the distribution of mi 


Sample sizes 

Critical values of mi 

First 

Second 

Lower 

Upper 

27i* ~)- 1 

2m 


m, U»> 

mi i 01) 

Wl (.01) 

li 

10 

» 

1 

9 


41 

40 

10 

12 

28 

30 

101 

100 

34 

38 

62 

66 

101 

200 

72 

80 

120 

128 

201 

200 

77 

84 

116 

123 

201 

400 

160 

181 

219 

240 

401 

400 

167 

177 

223 

233 

401 

800 

353 

367 

433 

447 

1001 

1000 

448 

463 

537 

552 


significance levels may be computed. The .01 and .05 percentage points of Wj 
for several sample sizes are given in Table I. The values for sample sizes of 10 
and 40 are computed directly from the probability law, while the larger samples 
have limits computed by the normal approximation. Thus for two samples of 
size 101 and 100, respectively, a value of rrh less than 38 would be significant 
at the .05 level. Similarly, at the upper .05 level, the hypothesis would be 
rejected if a value of m, were obtained which was greater than 62. The necessity 
for the upper limits could easily be eliminated by testing with respect to the 
smaller of nh and m 2 . However, for completeness,'the upper percentage points 


,, st f eme nt may be proved by showing that as »,»-»* such that m/n - constant, 

ne out of the moment generating function for the ratio is identical with the moment 
generating function of the normal distribution with zero mean and unit variance. 
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are included to show the range of values of mi in which the hypothesis that the 
two samples come from the same population may be accepted. 


3. The case of four intervals. If we let the first sample of size 4n + 3 be 
designated by (£i , £ 2 , • • £ 4 , 1 + 3 ), assumed drawn from a population with prob¬ 
ability element f(x) dx and ordered from least to greatest, then the range of x 
may be divided into four intervals by x n+1 , x 2n+2 , and x 3n+3 . The probability 
element of x„+i, £ 2 , 1 + 2 , £ 3 , 1+3 is 


Snvsra (C «*> *)’(C m *)'(C m dx )\L, m *)' 


'/(£fi+l) dXn+lf^in^i) dX 2 n+if(% 3 n+ 3 ) ^£3,,+3 . 


TABLE II 

The Case of Four Intervals 
.95 and .99 percentage points for the distribution of C 


. Sample sizes 

C. St 

C M 

First 

Second 

4 n + 3 

4 tn 

ft 

m ^ 

1 

15 

12 

3 

3 

.446 

.582 

63 

60 

15 

15 

.113 

.161 

103 

100 

25 

25 

072 

.102 


Let 

f 2,1+1 r z m+j r*an+a r M 

f{x)dx = pi, / fix) dz= pi, / f{x) dx = p a , / f{x)dx = pi. 

00 „ J *n+1 J *2n+2 

The probability element of p\ , p 2 , p 3 , and is 
( 6 ) 


P'<*“**> = pf v ° P4 " dpt dp> ■ 


Now let us consider the second sample, (x[, x'i, ■ ■ x im ), of size 4m. Let the 
number of observations falling in each of the preassigned intervals be m 4 , [i = 
1, 2, 3, 4), where m 4 = 4m — mi — m 2 - ms. The conditional probability of 
the m,, given the values of Xi (n +j) is also determined by the multinomial law. 


(7) 


Pr (m, | £,fn+l)) — 


(4m)! 


V^pVpTpV. 


mi!m 2 !ras!m 4 ! 

The joint distribution of the p; and the m,- is then 

(Q\ X> t~. 1 _ {in 3) 1 (4m) ! _Tv+mi„n+m 2 _,»+mi,„n +*4 J_ j_ j„ 

8 ) Prixiw , mf) - - ( -, ) - 4 mi!mj!mi!m<I Pi PI P» p* dpidp,d Pl , 
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To obtain the distribution, of the m, alone, the p< will be integrated out by the 
Dirichlet Integral 3 formula, giving a distribution which is clearly independent 
of the population distribution function/(x). 

. . . , _ (4 n + 3) 1 (4m) i (n + mQ i (n + mi) ! (n + m s ) I (n + nu) 1 

r (n !) 4 mi 1 m 2 Ima! 7114 ! (4m + 4n + 3)! 

To find the expected value of the mi, the probability law of mi will first be 
derived. The probability function for the value of x„ + i is 

Then we have the conditional probability 

< u > p '< m -1 *-*> - ” r,(i - 

and 


(12) P r (x„+i, mi) = 


(4 n + 3)! (4m) 1 


n! (3 ti + 2)! mi t (4m — mi) 


- Pi)* n+i+im ~ mi dpi. 


To obtam the expected value of mi, the joint distribution of m, and 'pi is 
multiplied by mi, summed on m t from 0 to 4m , and integrated on p L from 0 to 1. 


(13) 


171™ S _ ~h 3) ! f n /1 _ \3n+2 

E(m) ~ 711(37! + 2)1 Jo Pl(1 “ Pl) 


[? w pra ~ *- 


This interchange of the order of integration and summation is clearly valid. 
The quantity in brackets will be recognized as the first moment of the binomial 
distribution, (pi + q) im where q = 1 — pi , Therefore we have 


(14) 


E(mi) = f 4mpif(pi) dpi = 4mE(pi). 
Jo 


E (pi) and the higher moments of pi are found in the usual way by integrating 
the distributions as Beta Functions. From this we see that the expected value 
of mi is m. By repeating these operations on m 2 , m a , and m 4 , it can be seen 
that E(m,) = m, which also validated the statement made in the introduction. 


“ -A. discussion, of the Dirichlet Integral may be found in Woods— Advanced Calculus , p 
167 It may be stated as follows for the problem in which we are interested 

j.1-1 2 n-i(i _ x _ y _ 2 ) r-i dx d dz _ r( j )r(m)r(w)r(r) 

r(I + m + n + r) * 

where we integrate over the region bounded by x + y + z => I, and the three coordinate 
planes. 
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We have previously presented the criterion (1). 

The next problem is to find a distribution function to which the distribution 
of C may be fitted. A reasonable choice appears to be the Pearson Type I curve. 


(15) 


f( x ) = r ( r + x r_1 (l — a;)*" 1 

A] r(r)r(s) u ; ' 


The distribution of C is fitted by equating the first two moments of the two dis¬ 
tributions and solving for the constants r and s of the Type I distribution. Using 
the theorem that the mean value of the sum of variates is equal to the sum of 
their mean values, we have 


(16) E(C) = — [E(m\) + E(m\) + E(m\) + E(m\) - 4m 2 ]. 
ym 1 


Also the second moment may be written as 


E(C 2 ) = -L [£(m 4 0 + E(mi) + E(m\) + E(m\) + 16m 4 + 2 

(17) + 2E{m{ml) + 2+ 2E(mlml) + 2E(m\ ml) 

+ 2 EimWi) - 8m 2 [E(m\) + E(ml) + E(m\) + E(m \))]. 

The expected value of ml is found in the same manner as E(mi) and here also it 
can be shown that the E(m\) are all equal. The same procedure holds for 
E{m\). 


E(ml ) = m + 


m(4m — l)(n + 2) 
4n + 5 


(18) E(m\) = m + 


7m(4m — l)(n + 2) . 6m(4m — l)(4m — 2){n + 3)(ra + 2) 
4n + 5 (4n -f 6) (in + 5) 

m(4m — l)(4m — 2)(4m — 3)(« + 4)(n -f- 3)(n -f- 2) 
(4n + 7) (4 n + 6)(4n + 5) 


+ 


By using the moment generating function of the trinomial distribution, the 
E(mlml) may also be found in a similar manner 


(19) 


- ra(4m - l)(n + 1) , 2m(4m - l)(4m - 2 )(n + 1 )(n + 2) 
4n + 5 (4n + 6)(4n + 5) 


+ 


m(4m — l)(4m — 2)(4m — 3 )(n + 2 )(n + 1 )(n + 2) 
(4n + 7) (4n -f 6)(4n + 5) 


As a result we have 




4(4m — l)(n + 2) 
9m (in + 5) 


( 20 ) 



194 


HAROLD C. MATHISEN 


Let E(C) = A to simplify later relations to be computed. Finally 


E(C 2 ) = 


4 f 7(4m — l)(n + 2) ", 6(4 m — 1)(4 m ~ 2)(n + 3)(n + 2) 
81wi 5 _ 47i + 5 (in + 6) (4n + 5) — ' 


, (4tu — l)(4m — 2)(4m — 3 )(r + 4)(n + 3)(n + 2) 

+ (in + 7) (in + 6) (in + 5) + m ' 


( 21 ) 


3(4 in — l)(n + 1) , 6(4777 — l)(4m — 2)(n + l)(n + 2) 
in + 5 (in + G)(4n + 5) 

3(4m - 1) (im - 2)(im - 3)(n + 2 )\n + 1) _ , 

(in + 7)(4ti + 6)(4n + 5) 

_ 8m‘(4m — l)(7i + 2) ~[ 

471. + 5 J ’ 


To simplify later relations we let E(C 2 ) = B. 

The first two moments of the Type I distribution are easily found to be 


(22) w - -+ - a « - - a. 

r+s (r + s + 1) 

Solving these two simultaneous equations for r and s, 


(23) 


r = 


B — A 

A-* 
A A 


T 

S “ I 


r. 


A number of percentage points for the Type I distribution have been computed 
by Miss Catherine Thompson, [3]. Using these limits, the hypothesis may be 
accepted or rejected as to whether or not the two samples come from the same 
population. 

Table II shows the .95 and ,99 percentage points of C for three sample sizes. 


4. Summary. The problem considered here is that of devising a simple 
method of testing the hypothesis that two samples are from identical populations 
having continuous distribution functions It may be summarized briefly as 
follows. The first sample is used to establish any desired number of intervals 
into which the observations of the second sample may fall A test criterion is 
proposed which is based on the deviations of the numbers of elements of the 
second sample which fall in the intervals from the expected values of the respec¬ 
tive numbers. Two cases are discussed, that of two intervals and that of four 
intervals, making use of the median and quartiles in the first sample to deter¬ 
mine the intervals. Tables of 1% and 5% points for several sample sizes of 
both cases are given. 
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. NOTES 


This section is devoted to brief research and expository articles, notes on method¬ 
ology and other short items. 


NOTE ON THE INDEPENDENCE OF CERTAIN QUADRATIC FORMS 

By Allen T. Craig 
University of Iowa 

Various approaches to the problem of the independence of quadratic forms 
in normally and independently distributed variables have been made by R A. 
Fisher, Cochran, Madow and others. It is the purpose of this note to point 
out a few simple propositions which, in so far as the writer is aware, have not 
had specific mention in the literature. 


1. Independence of certain quadratic forms. Theorem 1: A necessary and 
sufficient condition that two real symmetric quadratic forms, m n normally and 
independently distributed variables, be independent in the probability sense is that 
the product of the matrices of the forms be zero. 

Let the chance variable x be normally distributed with mean zero and unit 
variance Let aii, x 2 , • • , be n independent values of x and let A and B 
be two real symmetric matrices, each of order n Write Qi = and 

Qt - 226j jZtX, where || a,, || = A and jj b u || = B. It is well known-that the 
generating function of the moments of the joint distribution of Qi and Q% can be 
written 

G(\, X') = |1 - \A - \'B |“ 4 , 

so that 


(1) 1 1 - \A - \'B | = 1 1 - \A || I - \'B | , 

for all real values of X and X', is necessary and sufficient for the independence of 
Qi and Q». 

If Qi and Q-l are independent, then (1), being true for all real values of X and 
X', is in particular true for X = X'. Thus 

(2) 1 1 - X(A + B) | = 1 1 - \A || I - IB | , 

Denote by n , n and r < n + r 2 respectively the ranks of A, B and A + B. 
Then r = n -f r 2 since (2) expresses the identity of two polynomials in X of 
degrees r and n r 2 . 

Further, if we write 

| / - \A | - (1 - X Pl ) • ■ ■ (1 — Xp ri ), 

| / — XB | = (1 — \qi) (1 -Mr t ), 
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and I Z - X(A + B) | = (1 - Xs x ) ■ ■ (1 - Xs M+r2 ), then, because the factoriza¬ 
tion of polynomials is unique, each Sj can be paired with one of the numbers 
Pi . ' • • , Pn , Qi, • • • , 3r a ■ Thus, if Q\ and Qi are independent, the rank of 
A. + B is the sum of the ranks of A and B, and the non-zero roots of the char¬ 
acteristic equation of A + B are those of the characteristic equation of A 
together with those of the characteristic equation of B. There exists an appro¬ 
priately chosen orthogonal matrix L of order n such that L'(A + B)L, U being 
the conjugate of L, is a matrix with the reciprocals of the numbers pi, ■ ■ ■ , p r ,, 
<?i > ‘* j on principal diagonal and zeros elsewhere. Then UAL and 
L'BL have no overlapping non-zero elements and JJALL’BL = 0. But L' = 
IT 1 , the inverse of L. Hence, upon multiplying both membeis of the preceding 
equation on the right by L‘ and on the left by L, we have A B = 0. Since 
A = A' and B = B', likewise BA = 0. 

Conversely, suppose AB — 0, Then the matrix (Z — XA)(Z — \'B) = 
I — XA — X'B. These matrices being equal, their determinants are equal and 
the condition (1) for the independence of Qi and Qi is satisfied 

The theorem is readily extended to the case of the mutual independence of 
any finite number of such quadratic forms. 

The product of a non-singular matrix and a matrix of rank R is a matrix of 
rank R. Hence, every non-smgular quadratic form of the kind here discussed 
is correlated with every non-identically vanishing quadratic form in the same 
variables. 

2. Conditions for independent Chi-Square distributions. The preceding 
theorem enables one to determine, by multiplication of matrices, whether real 
symmetric quadratic forms in normally and independently distributed variables 
are themselves independent in the probability sense. The following theorem 
affords a simple test as to whether the distributions arc of the Chi-Square type. 

Theorem 2: Necessary and sufficient conditions thdt each of two real symmetric 
quadratic forms , m n normally and independently distributed variables with mean 
zero and unit variance, be independently distributed as is Chi-Square, are that 
the product of the matrices of the forms be zero and that each matrix equal its own 
square. 

If Q x and Qi aie independently distributed as is Chi-Square, then AB - 0 
and each of the non-zero roots of the characteristic equations of A and B is +1. 
For an appropriately chosen orthogonal matrix L, of order n, UAL is a matrix 
with ri elements on the principal diagonal +1, all other elements being zero. 
For such a matrix it is seen that (L'AL)(UAL) = L’jCJj = UAL and A 2 = A. 
A similar argument shows that B 2 = B. 

Conversely, if AB = 0, then Q x and Q 2 are independent Further, if A 2 = A 
and B* = B, each of the non-zero roots of the characteristic equations of A and 
B is +1 This follows from the fact that the roots of the characteristic equa¬ 
tion of the square of any matrix are .themselves the squares of the roots of the 
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characteristic equation of that matrix. Since A and B are real and symmetric, 
the roots under consideration are real. Thus Qi and Qi have independent 
Chi-Square distributions with n and n degiees of freedom respectively. 

This theorem can likewise be extended to any finite number of these quadratic 
foims 

Of special interest is the case of, say k, quadratic forms for which the sum of 
the k matrices is the identity matrix Thus Ai + A 2 + ■ ■ + A k - I. By 
Theorem 1, it is both necessary and sufficient for the mutual independence of the 
k forms that A„A„ = 0, u 9* v. 

Now 


A, — I — A\ — • ■ • — A,_i — — ■ • • — A] — • ■ — Ai 

and 

AjA j — A] — AiAj —• • — A,_iA, Ai^iA, — * * * — A, — • • * — A^A,, 

so that A; = A, . In this particular case it is to be seen that the mutual inde¬ 
pendence of the forms implies that their several distnbutions are of the Chi- 
Square type. 


A CHARACTERIZATION OF THE NORMAL DISTRIBUTION 

By Irving Kaplansxy 
Harvard University 

In 1925 R. A. Fisher gave a geometric derivation of the joint distribution of 
mean and vaiiance in samples from a normal population ( Melron , Yol. 5, pp. 
90-104). On examining the argument however, we find that an (apparently) 
more general lesult is actually established: if fix i) ■ • f{x n ) is a function g(m, s) 
of the sample mean m and standard deviation s, then the probability density of 
m and s m samples of n from the population /(x) is g{m, s )s n-2 . This condition 
on f(x) is of course satisfied if f{x) is noimal; in this note we shall conversely show 
that foi n ^ 3 it ehaiacterizes the normal distribution. In the proof it will be 
assumed that g{m, s) possesses partial derivatives of the first order, although a 
weaker assumption would probably suffice. 

Let us for the moment restrict the variables x, to values such that f(xf) > 0. 
After a change of notation we have 

<K*i) + • • • + 4>{x n ) = h(u, v), 

where <j> = log/, u = Xi + • • • + x n , v — £(*? + • * • + x\). A differentiation 
yields 


<f>'(x,) = h v + hvi,. 
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Solving two of these equations for h „, we find 
(1) h v - 


x. 


(* ^j). 


and, for n S 3, it follows that the right member of (1) is a constant, say 2A. 

Then 


4>'(x,) — 2Ax, = 4>'( x i) — 2Ax,' = a constant B. 

4>{x ) = Ax s + 5x + C. 

0 

We now have/(x) = whenever/(x) > 0; but since fix) is continuous, this 
implies fix) = e* [x) everywhere. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of general interest 

Personal Items 

Dr. Holbrook Working has been appointed Chief Statistical Consultant on 
Industrial Processes and Products in the Office of Production RcHcarch and 
Development of the War Production Board. 

Professor Harold Hotelling of Columbia University was the official representa¬ 
tive of the Institute of Mathematical Statistics at the C'operniean Quadri- 
centennial Celebration which was held in New York City on May 24. 

Dr Edward B. Olds has taken a position with the Curtiss-Wright Corporation, 
Dr. Nilnn Norris is a Sergeant with the Fourth Statistical Control ('nit of the 
Fourth Air Foice with headquarters at San Francisco, California. 

Dr Edward Helly is with the Signal Corps Training Program at Illinois Insti¬ 
tute of Technology. 

Dr. C. W. Cotterman is in the United States Army at Camp Grant, Illinois, 
Mr. M D. Bingham has been commissioned an Ensign in the United States 
Naval Reserve and is stationed at Fort Schuyler, New York. 

Lt. George W. Petrie, USNR, is teaching in the Midshipmen’s School at 
Notre Dame, Ind. 


New Members 


The following persons have been cleated to membership in the Institute: 

Arias B., Jorge Civ. Eng (Guatemala) Eng., Rural Electrification Administration, 420 
Locust St., St. Louis, Mo 

Bailey, A. L. B.S. (Michigan) Stat., American Mutual Alliance, 60 East 42 St., New York, 


Becker, Harold W. Instr , Mare Island Trainee .School, m Benson Ave., Vallejo, Calif 
Bernstein, Shir ey R. B_S. (Carnegm Inst. Tech ) Res Asst., United Steelworkers of 
America, Pittsburgh, Pa. £601 Beverly PL 

Bickerstaff, Asst. Prof. Thomas A. M.A (Mississippi) Univ. of Miss., University, Miss. 
Bunbaum Asst. Prof. Z. William. PhD. (Lwow) Univ. of Wash., Seattle, Wash. 
Brumbaugh, Prof. Martin A. Ph.D (Pennsylvania) Univ. of Buffalo, Buffalo, N. Y. 

Bur “ n gan State C0lL) InStr '' Mi0higan 0o,L > Lana- 

Cohen, Josef B. B S. (Chicago) Saga Fellow in Psychology, Cornell Univ., Ithaca. N. Y 
Cope, Asso. Prof. T. Freeman. Ph.D. (Chicago) Queens College, Flushing, N Y ' 

S?“r&hS e PhD'vT',. 3 ". 1 D ?" i "" Bur '»' 8 “-,Ckwk. 

r.=™::iLX^' umb ‘* ) Sr ' . . . 

SR KKS* **”“»> o—. u»iv„ 

*' Slnd,n ‘’ I ”'“‘ T “ h '’ Mta. n Bo, Sto i, Ri„ 
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Gottfried, Bert A. A.M. (Columbia) Stat, Clerk, 4300 Kayivood Dr,, Mt. Ranier, Md, 

Hamilton, Prof. Thomas R. Ph D. (Columbia) Texas A &M Coll., College Station, Tex. 

Heide, J. D. M.S (Iowa) Stat,, U S Rubber Co., 1324 Alloona Ave , Eau Claire, Wise. 

Hilfer, Irma. M A. (Columbia) Actuary, N. Y. C. Board of Transportation, 165 IF. 97 St., 
New York, N. V. 

Howell, John M. B.A. (UCLA) Stat,, Northrop Aircraft Inc., Hawthorne, Calif. 4140 
W. 63 St., Los Angeles, Calif 

Hurwicz, Leonid, L.LM. (Warsaw) Res. Asso , Cowles Comm,, Univ of Chicago, Chi¬ 
cago, Ill 

Kendall, Maurice G. MA (Cambridge) Stat., Chamber of Shipping of the United King¬ 
dom, Richmond House, Aldenham Rd , Bushey, Eng 

Klein, Lawrence R. B.A (California) Teaching Fellow, Maas Inst Tech., Cambridge, 
Mass. 

Kuznets, George M. Ph.D (California) Instr., Giannini Foundation, Univ of Calif., 
Berkeley, Calif. 

Landau, H. G. M.S. (Carnegie Inst Tech) Stat, Analyst, War Dept., Washington, D. C 
$408 $0 Si., N.E 

Langmuir, Charles K. EdM. (Harvard) Carnegie Foundation. 437 We.si 59 Si., New 
York, N. Y. 

Levy, Henry C, L L.B. (Fordham) Instr., N Y C C., New York, N. Y 600 West 116 St. 

Li, Jerome C. R. B S, (Nanking) Student, Iowa State Coll , Ames, Iowa. $184 Lincoln 
Way. 

Lieberman, Jacob E. B S. (Brooklyn Coll.) Jr Stat , Census Bureau, Washington, D, C. 

14 St., N E. 

Martin, Margaret F. MA (Minnesota) Instr., Columbia Univ , New York, N Y. 1230 
Amsterdam Aue. 

Nash, Stanley W. B A. (Coll of PugotSound) San Joaquin Experimental Range, O’Neals, 
Calif. 

Norton, Horace W. Ph.D. (London) Sr Meteorologist, U S. Weather Bur., Washington, 
D C 8118 North First Rd , Arlington, Va. 

Olds, Edward B. Ph.D (Pittsburgh) Stat., Curtiss-Wnght Corp, 898 Niagra Falls 
Bled., Buffalo, N. Y. 

Preston, Bernard. C P A , 108 Park Ave., Neu> York, N Y. 

Rosenblatt, David. B.S (Coll. City of N Y) Asst Stat .,1488 Whittier St , N W , Wash¬ 
ington, D C. 

Sard, Asst. Prof. Arthur. Ph.D (Harvard) Queens College, Flushing, N. Y. 146-19 
Beech Ave. 

Schapiro, Anne. B.A. (Bryn Mawr) Jr. Analyst, Institute of Applied Econometrics, 
350 W. 57 St., New York, N. Y. 

Simpson, William B. Grad. Student, Columbia Univ , New York, N. Y 

Springer, Melvin D. M.S (Illinois) Asst Instr, Univ of Illinois, Urbana, Ill. 

Stein, Irving. B.S (Mass Inst Tech ) Asso. Stat , War Dept., Washington, D C. 611 
Oglethorpe St. 

Stergion, Andrew P. M.S (Mass Inst Tech.) 1st Lt , USA, The Proving Center, Aber¬ 
deen Proving Gd., Md. 

Sternhell, Arthur I. B.A._ (New York) Staff Asst , Metropolitan Life Ins. Co., 1938 E. 
Fremont Ave., Parkchester, N. Y. 

Thompson, Louis T. E. Ph.D. (Clark) Dir Res. and Dev., Lukas-Harold Corp., In¬ 
dianapolis, Ind. 340 East Maple Rd. 

Tyler, Asst. Prof. George W. M.A (Duke) Virginia Polytechnic Inst , Blacksburg, Va. 

Working, Holbrook S. Pli D. (Wisconsin) Chief Stat. Consultant, War Production 
Board, Washington, D C. Food Res. Inst., Stanford Univ , Calif 
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The following persons have been elected to Junior membership in the Institute. 
Blumenthal, Lydia. Hunter College, New York, N. Y. 1001 Lincoln PI,, Brooklyn, N. T. 
Gunlogsoii, Lee. Umv of Minnesota, Minneapolis, Minn. 1906 Third, Ave. 

Heacock, Richard R. Oregon State Coll , Corvallis, Ore. P 0. Box 207, Seaside, Ore. 
Locatelli, Humbert J. Columbia Umv., New York, N. Y. 44 Seaman Ave, 

Mathisen, Harold C.> Jr. Princeton Univ., Princeton, N J 4 Middle Dod Hall. 

Murphy, Ray Bradford. Princeton Univ , Princeton, N J. 28 Godfrey Rd., Upper Mont¬ 
clair, N, J. 

Peters, Edward J., Jr. Georgetown Univ., Washington, D. C. 126 St, James PL, Atlantic 
City, N. J 

Smith, Joan T. Univ of Minnesota, St. Paul, Minn. 678 Hast Nebraska Ave. 



SPECIAL COURSES IN STATISTICAL QUALITY CONTROL 


The application of statistics to quality control is now being furthered in a 
program in which the War Production Board and the U. S. Office of Education 
are cooperating to assist statisticians in various industrial areas to provide 
suitable courses of instruction sponsored by their own institutions. 

The general plan of the program has been influenced by two conclusions 
drawn from the experience gained in ESMWT courses carried on by Stanford 
University during 1942-43- 1 These conclusions were: (1) that a short full¬ 
time course in statistical quality control tends to be peculiarly effective; and 
(2) that it is vital to have the initial courses followed by meetings in which the 
course members gather to report on applications they have made and to receive 
encouragement and any needed assistance. 

The giving of short full-time courses presents a problem of assembling a suitable 
staff, since four instructors will ordinarily be needed. If this problem wei e solved 
by arranging for a single staff to tour all the principal industrial regions giving 
courses in quality control, the local leadership necessary for establishing wide¬ 
spread use of statistical methods of quality control in industry would not be 
developed The program adopted seems to offer an effective solution of these 
problems. 

Under the program now in effect, the War Production Board, through its 
Office of Production Research and Development supplies an experienced person 
to assist with the arrangement of courses and to participate in the instruction. 2 
Two of the instructors in each course will ordinarily be provided by a local educa¬ 
tional institution, which will also promote the course and make necessary local 
arrangements through its institutional representative of the Engineering Science 
and Management War Training program. It is not considered necessaiy that 
the instructors provided by the institution have previous experience with statisti¬ 
cal quality control provided they are sufficiently competent in the theory of 
sampling, but it is desirable that at least one of them have practical experience 
with quality control. It may often happen that one of the instructors can be a 
quality control man from a local industrial establishment. The representative 
of the WPB will assist with arrangements for bringing in one (or, where needed, 
two) additional outside instructors. 

The sponsoring institution costs for the courses, which do not include the 
salary and expenses of the representative of the WPB, may be provided through 
the ESMWT program, The follow-up work with men who have taken the 
initial courses may be arranged also as part of the ESMWT program of the 


1 A description of these courses offered by Stanford University appeared in the Annalt 
of Mathematical Stahctice, March 1943, p, 96. 

1 At present Professor Holbrook Working is serving in this*capacity 
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educational institution sponsoring the original course. The follow-up work 
should be handled by a local instructor who participated in the original course. 

The two basic courses and the one follow-up course that have already been 
given by Stanford University were conducted under essentially the plan out¬ 
lined above, except that they did not have the benefit of assistance from the 
WPB. Three courses have thus far (May 25) been arranged under the new 
plan: one sponsored by Rhode Island State College, to be held during May 27 
to June 2 at Newport, and two sponsored by Stanford University, to be held 
respectively in Los Angeles, June 13 to 20, and in San Francisco, June 22 to 29. 
Preliminary steps have been taken toward the arrangement of several additional 
courses. 



REPORT OF THE NEW YORK MEETING OF THE INSTITUTE 


A joint meeting between the Institute and the American Society of Mechanical 
Engineers was held on Saturday, May 29, 1943 at the Engineering Societies 
Building, 29 West 39th Street, New York City. Of the ninety-five individuals 
attending the meeting, the following fifty-seven members of the Institute were 
present: 

Theodore W. Anderson, K. J. Arnold, Robert E. Bechhofer, B. M. Bennett, C. I. Bliaa, 
MaryE Boozer,? Boschac, A H. Bowker, BurtonH. Camp, A. G. Cohen Jr., H. F. Dodge, 

C. Eisenharfc, Mary L Elvebaek, W. C. Flaherty, H. Goode, John I. Griffin, Charles C. 
Grove, Frank E. Grubbs, E. J. Gurrbel, Harold Hotelling, J. M Juran, B. F. Kimball, 
Lila' Knudsen, Howard Levene, E. Vernon Lewis, Simon Lopata, Frank W. Lynch, 
Henry Mann, E. C Molina, N Morrison, Philip J. McCarthy, Luia F. Nanni, 
Franldin 9. Nelson, M. L. Norden, P, S. Olmstead, R F. Paaaano, Edward Paulson, G. A 

D. Preinreich, A C. Rosander, Arthur Sard, Henry Schefte, Bernice Seherl, Edward M, 
Sehrock, L W. Shaw, William B. Simpson, S G. Small, Arthur Stein, Andrew P. Stergion, 
M. Stevens, David F Votaw Jr., A Wald, Helen M Walker, W. A. Wallis, S S. Wilks, J. 
Wolfowitz, L C. Young 

The general topic of the meeting was Industrial Applications of Statistics. At 
the morning session the following papers were presented, with Professor Harold 
Hotelling presiding: 

1 On the Theory of Buns with some Application to Quality Control 
•T. Wolfowitz. 

2. On the Presentation of Data as Evidence. 

Churchill Eisenhart. 

At the afternoon session, the following papers were presented with Mr. E. C. 
Molina as Chairman: 

1. A Sampling Inspection Plan for Continuous Production. 

H F. Dodge 

2 Tolerances and Product Acceptability 
L. C Young. 

A meeting of the Board of Directors was held after the afternoon session. 

Edwin G. Olds 

Secretary 
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THE COMPARISON OF DIFFERENT SCALES OF MEASUREMENT FOR 
EXPERIMENTAL RESULTS 1 - 2 

By W. G. Cochran 
Iowa State College 

1. Introduction. In some fields of research, the development of a satisfactoiy 
method for measuring the effects of experimental treatments constitutes a diffi¬ 
cult problem. The estimation of the vitamin content of preparations of foods 
furnishes a good example, for most of the vitamins several years of work were 
required to construct a leliable method of assay. In other cases, where the ideal 
method for measuring treatment responses is costly or troublesome, a search 
may be made for a more convenient substitute Thus m pasture or forage-crop 
experiments the species composition of a plot may be estimated by eye inspection 
as a substitute for a complete botanical separation. As a third example we may 
quote experiments in cookery, wheie the flavor and quality of the dishes are 
subject to the whims of human taste. Frequently a panel of judges is employed, 
each of whom scores the dishes independently It is not easy to determine how 
the panel should be chosen, nor how representative its verdicts are of consumer 
preferences in general 

When such pioblems are investigated, experiments may be carried out spe¬ 
cifically for the purpose of comparing two or more methods or scales of measure¬ 
ment. Where the process of measurement affects only the final stages of the 
experiment, as in the last two examples quoted above, all that is necessary is to 
score the same experiment by the various scales under consideration. In com¬ 
paring two different methods of assaying vitamins, on the other hand, inde¬ 
pendent expeiiments aie frequently required, the only common feature being 
that the same set of tieatments is tested in both experiments. 

In the interpretation of the results of such experiments, two types of compari¬ 
son are of general interest One concerns the relations between the scales. It 
may be summed up rather loosely in the question • Are the effects of the treat¬ 
ments the same in all scales? For a more exact formulation, consider the case 
of two scales, which is probably the most frequent m practice. Let , £>, 
be the true means of the fth treatment as measured on the two scales. We may 
wish to examine the following hypotheses: 

(i) Scales equivalent. 

(1 ) £h = £21 , (all t ); 

(li) Scales equivalent, apart from a constant difference. 

(2) la = + e, (all f); 

1 Paper presented at a meeting of the Institute of Mathematical Statistics, Washington, 
D C , June 18, 1943 

2 Journal Paper No J-1136 of the Iowa Agricultural Experiment Station, Ames, Iowa. 
Project 514 
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(iii) Scales linearly related: 

(3) a(i i + Mu = 7> (all t); 

(iv) Relation monotonic, but not linear: 

(4) ?u - Mi . «, ft • • • ). (all <); 

where the function ia strictly monotonic. 

In this case the two scales are mutually consistent in that they place any set 
of treatments in the same order. The ratio of a treatment difference in one scale 
to the corresponding difference in the other scale is, however, not constant. 

(v) Relation not monotonic- Here the scales do not place the treatments m the 
same order and consequently are not satisfactory substitutes for each other. 

The second question concerns the relative accuracy or sensitivity of the two 
scales. For practical purposes this question may be put as follows: how many 
replications are required with the second scale to attain the accuracy given by r 
replications with the first scale? It is clear that the answer depends both on the 
experimental errors associated with the scales and on the magmtudes of the 
treatment effects in the two scales. For example, Coward [1] reports that in 
the assay of vitamin D, male rats give a higher experimental error than females, 
yet provide a more accurate assay because they are more responsive The relar 
tive accuracy may be different in different parts of the two scales. This is likely 
to happen whenever the relation between the scales is of type (iv) above. 

This paper gives a preliminary discussion of some of the simpler questions 
raised above, to which recent work in multivariate analysis is applicable. A 
complete solution for small sample work appears to demand considerable further 
development in the distribution theory of multivariate analysis. 

The discussion is confined to the case in which all scales measure the same 
experiment. The case where each scale requires a separate experiment may be 
expected to be somewhat simpler, but cannot conveniently be treated as a special 
case of the procedure for a single experiment. 

2. Assumptions. Let xi, xj, • • • x P denote measurements on the p scales 
and let rq and n^ be the numbers of degrees of freedom for treatments and error 
respectively. The experimental data furnish a joint analysis of variance and 
covariance of the p variates as follows: 

Sum of squares 


d.f. or products 

Mean. 1 m,, 

(5) Treatments . n\ an 

Error. n* bn 


It will be assumed that x 1 , ■ ■ ■ , x p follow a multivariate normal distribution, 
and that for any pair of variates x ,, x; the error mean covariance <r,, is constant 
throughout the experiment (though it may vary as i and j vary). Thus the 
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quantities b< 7 follow the standard joint distribution, Wisha^t [16], of sums of 
squares and products while the quantities m„ and a,, follow the corresponding 
non-central distributions and the three sets of distributions are independent, 

3. Tests for equivalence. If there are only two scales, a test for equivalence 
is obtained from elementary techniques. An analysis of variance similar to (5) 
is computed on the differences between the two scales for every observation, If 
equations (1) hold in the population, the sums of squares for the Mean, Treat¬ 
ments and Error are distributed independently as x 2 (<m + <tm — 2<7 ]2 ). The 
pooled mean square for the Mean and Treatments may therefore be compared 
with the Error mean square in a variance-ratio test, the degrees of freedom being 
(fti + 1) and rii . If the scales are equivalent apart from a constant difference, 
the same result is valid for Treatments and Error, while the mean square for the 
Mean is proportional to a non-central x 2 . ' Thus separate z- or /'-tests on the 
Mean and Treatments assist in distinguishing between hypotheses (1) and (2), 

4. More than two scales. Let Li be the true mean of the tth treatment as 
measured on the tth scale. The first two hypotheses may now be written re¬ 
spectively: 

(10 L< = h 

(20 i. t = & + 

for i — 1, 2, > • ■ , p. The quantities «,•, whose sum may be assumed zero, 
measure the constant differences among the scales. 

If the interactions of all components with Scales are computed, the analysis 
of variance extends formally, with the following separation of degrees of freedom: 


d.f. 

Mean X Scales . (p — 1) 

(6) Treatments X Scales . n t (p — 1) 

Error. n-i(p — 1) 


The three lines in the analysis play the same roles as before in relation to 
hypotheses (10 and (20- When p > 2, however, it may be shown that the 
three sums of squares are not distributed as multiples of x unless (i) all scales 
have the same error variance and (ii) every pair of scales has the same correla¬ 
tion coefficient, Where these conditions are reasonably well satisfied, as hap¬ 
pens possibly when experienced judges employ a similar scoring system, the 
above analysis supplies approximate tests. But with scales which differ widely 
in their experimental errors or in their degrees of interrcorrelation, the validity 
of variance-ratio tests is open to more serious question. 

In order to obtain an exact test, we may note that hypothesis (1') is closely 
related to the Wilks-Lawley hypothesis (Wilks [15], Lawley [9], Hsu [7]) that the 
means of k populations are all equal. If each treatment denotes a separate 
population, the Wilks-Lawley hypothesis states that 

( 7 ) 


£•* — £» 


(< — 1 , 2 , • ■ • , Tli + 1 ). 
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Since this differs from (1') only in the interchange of the letters i and t, it is 
clear that the two hypotheses may be subjected to the same kind of test. 

For the details of the procedure we first divide the (p — 1) comparisons among 
scales into (p - 1) single comparisons by the introduction of a set of variates 
y* i 0 = f * 2, ■ • ■ j p l)* 

V 

( 8 ) Hi — ^'1 ■ 

J-l 

Any set of if s may be chosen, provided that they are linearly independent and 


that 

(9) 

Z x„ = 0, 

J-l 

(i = 1, 2, ••• (p - 1)). 

Thus with three scales we might use yi = : 

£1 — £ 2 , I /2 = ■'Hi — £3 or yi = 

2xi — £2 - x 3 , yt = — cc a . 

The next step is to compute 

an analysis of 

variance and covariance of the y 

variates, as follows: 

df • 

Sum of squares 
or products 

Mean. 

... 1 

m\, 

(10) Treatments . 

. . . . Hi 

a\, 

Error .... 

. . . Hi 

b\, 


If hypothesis (1') holds, it follows from (9) that the thiee sets of quantities 
m[,, a'u and b[, all follow the standard joint distribution for sums of squares 
and products. Hence Wilks’ test (Wilks [15], Pearson and Wilks [11], Hsu [7]), 
for the equality of the means of k populations may be applied. For a single test 
of hypothesis (1') we may use 


( 11 ) 


W = 


b[, 


1^1 

d - d~ 1 1 


As before, if W is significant we may test whether the deviation is due to constant 
differences or to other types of difference among the scales by calculating 


(12) 

w m = 

K,l 

I b[, + m[, | 

and 

(13) 

W t = 

1 

1 4- a!) l 


The flexibility of analysis of variance tests is not sacrificed, in particular we 
may test any desired subgroup of the treatments or of the scales. When there 
are only two scales the tests reduce to those given in section 3. 

The tests are invariant under homogeneous linear transformations of the y’s 
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which explains why the form of the subdivision of the scale compaiisons is im¬ 
material. In fact for purposes of computation it is not necessary to introduce 
the p’s. By taking a simple transformation and expressing a,, in terms of a„, 
etc., we may express W directly in teims of the x’s, as follows 


(14) 


Da, 

___»j_ 

E (-B + M + A)„ ’ 


where , (B + M + A)„ are respectively the co-factors of the matrices 
(6 W ), (6,, + m„ + a,,). Analogous expressions hold for W m and Ft . In 
practice it will often be preferable to compute the y’s in order that particular 
comparisons among the scale variates may be examined in detail. 

The form of the frequency distribution has been worked out by Wilks [15]. 
For small values of ni and p, the test of significance can be referred to the recent 
tables of the significance levels of the incomplete Beta-function, Thompson [13], 
or to variance-ratio tables. Such cases are listed below, from Wilks [15] and 
Hsu [7]. In our notation, vi is taken as (tii + 1) in equation (11), as 1 in equa¬ 
tion (12) and as rii in equation (13). 


P = 3, n > 1 : /(F) a: F 4(r “ _3) (l - F 4 ) n_1 

: F{2pi , 2(wj - 1)} 

pi = 1 : /(F) « W 1( ’ ,, " p) (l - W) Up ~ 3) 


_(rk~ 1)(1 — W 1 ) 


"i 


Wi 


: F{p - 1, rw - p] 


(nt - p)(l - W) 
(.v ~ 1)W 


This distribution applies to all tests made on the Mean, equation (12), and all 
cases where a single degree of freedom is isolated from the treatment comparisons. 


vj, = 2 ; /(F) « F !( " ,-i0 (1 - W i ) p ' 2 

: F{2(p - 1), 2(712 - V + 2)} 


(n, - v + 2)(1 - F 4 ) 
(p - 1)F‘ 


A tabulation of the distributions for four and five scales would be useful. 
Hsu [7] has shown that as tm. becomes large, the distribution of — 712 log W tends 
to that of x 2 with vi (p — 1) degrees of freedom. In general, this approximation 
does not agree very well with the exact distributions above unless ti 2 exceeds 60. 


5. Interpretation as a problem in canonical correlations. As an introduction 
to the methods that will be used in testing the hypothesis of linearity, we may 
note that hypotheses (1') and (2') can be described in terms of canonical correla¬ 
tions. Fisher [5] has pointed out that the roots 8 of the equation 

1 ciij 8(n t j -]- h,j) | = 0, 


(15) 
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are the squares of the sample canonical correlations between the x-variates and 
a set of n x dummy variates which represent the rq degrees of freedom among 
treatments. In order to obtain the corresponding equation for the population 
correlations, we may suppose that rq and p remain constant while the number 
of replicates r' and consequently rq increase without limit. After the removal of 
a common factor r', equation (15) becomes 

(16) I <A>, ~ P 2 (^u + V<T{,) | = 0, 
where 

(17) = T! (f.« - l)(ta - l). 

The value of the coefficient v depends on the type pf experimental design. For 
a randomized block layout, v = Wi and for a simple group comparison v — 
(wi + !)• 

Now if hypothesis (2') is true, i.e., £, e = ft e,, it follows that is inde¬ 
pendent of i and j. In this event equation (16) has (p — 1) roots p 2 which are 
identically zero. The remaining root corresponds to the best discriminant func¬ 
tion, Fisher [5], and does not vanish unless the treatments have no effects on 
any of the s-variates. 

Let 2/3,*, be a population canonical variate for the scale variables. The 
coefficients /3, satisfy the equations 

(18) £ (M ’f'ii — P 2 WV + r<r„)J = 0. i = 1, • ■ ■ p. 

J 

For a zero root p 2 = 0 we have = constant. Hence if a zero root is substi¬ 
tuted, equation (18) degenerates into 

(19) ft + ft + • • • + ft> = _0. 

To summarize, hypothesis (2') specifies that (i) (p — 1) of the population 
canonical correlations vanish and (ii) any variate 2/?,*, is a canonical scale 
variate corresponding to a zero root, provided that equation (19) is satisfied. 
Analogous results hold for hypothesis (!'); in this case we replace the Treatments 
line of the analysis of variance by the (Treatments -j- Mean) line. 

6. Test for linear relationship—two scales. We may assume n x > 2, other¬ 
wise no test of linearity is possible. If the values of a, 0 and y in equations (3) 
are known, the problem can be reduced to that of testing hypothesis (1) or (2). 
Since this case is unlikely to be encounteied frequently in practice, further details 
are omitted. 

When a, 0 and y are unknown, we may theoretically replace the variates X\ 
and by iq = axi + ftc 2 and v 2 = ppci +- p 2 x 2 , where pi and p 2 are chosen so 
that Vi and vi are independently distributed. If hypothesis (3) holds, it follows 
from (17) that in terms of the a’s, tf/ n = ipn = 0. Since in addition <ru = 0, 
the two roots of equation (16) are 

(20) p = 0 and p = + van). 
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Thus hypothesis (3) implies that one of the population canonical correlations 
vanishes. Unlike the previous case, however, we cannot construct the corre¬ 
sponding canonical variate, which requires knowledge of <x and /3. 

The selection of a sample test criterion opens up some difficulties. Pending 
further elucidation of the problem, the natural choice seems to be the square 
r\ of the lower sample canomcal correlation, or the equivalent quantity h 2 == 
r|/(l — r\), where hi is the-lower root of the equation: 

(21) | a„ — hh l} | = 0. 


It appears likely, however, that r\ and h 2 are not sufficient estimates of the 
corresponding population parameters. 

When rh is large, Hsu [8] has shown that the distribution of nJh tends to that 
of x with (rii — 1) degrees of freedom. A considerable advance towards the 
small-sample distribution is obtainable from Madow [10], who developed an 
expression for the exact distribution of r, and r\ when one of the population 
correlations is different from zero. In our notation this result, which is an im¬ 
portant generalization of the distribution found by Fisher [5] and Girshick [6] 
may be written as follows: 


(ni ni — 2)1 
4ir(ni — 2) 1 (n-i — 2)! 


{ArXpr f(1 _ r J )(1 _ r |)} 2 (r l _ r ‘) dr \ d r\ 


n S -3 


( 22 ) 


'ril + n 2 711 + 712 


x (1 - pi) 


2 \K"i+ n a) 


•M 


n i u \ 
’ 2 ’ PlV ) 


dy 


■v 7 (rt - y)(y - r\) 


where pi is the non-vanishing population correlation. It is evident from the 
form of (22) that the distribution of r\ or h 2 involves pi. The conditional dis¬ 
tribution of h 2 /hi may be relatively insensitive to changes in pi, though even 
this distribution does not seem entirely independent of pi. 

When pi is unity, the small-sample distribution of h 2 is that of the ratio of two 
independent sums of squares, x.e., h 2 = (ni — l)e 2 ‘/n 2 , with (»i — 1) and 712 
degrees of freedom. This result is a particular case of a more general result 
proved in section 8. From (20) it is seen that pi is close to unity when f 2 2 is 
large relative to c 22 , i.e., when the real differences among the treatments are 
large relative to the,experimental errors. In the absence of a usable exact 
solution, the F-distribution may be a better approximation than the large-sample 
distribution of h 2 for data where r\ is found to be close to unity, though proof of 
this statement is not yet available. 

If it is desired to test hypothesis (3) with the additional assumption that 
7 = 0, we replace a,, by (a,, + m,,) in equation (21) for h 2 , and iq by (ni + 1) 
in the distribution theory. 


7. Connection with the method of least squares. The previous approach has 
an interesting connection with the method of least squares. We are required to 
test the linearity of relationship between (n 2 + 1) pairs of means (in , xu). 
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Both variates are subject to error and the errois aie con elated; with r' leplica- 
tions the population variances and covariance of these means are <r u /r', <r M /r' 
ancl a a /r' For these unknown quantities we have sample estimates b n /n%r', 
bn/nir' and b l2 /n 2 r' respectively, derived from the limn line of the analysis of 
variance 

The procedure suggested by the method of least squares is to estimate the 
parameters of the line and use the deviations of the points {x u , x lt ) from the 
line for a test of linearity If the population variances were known, the un¬ 
known quantities a, ft, y and would be estimated by minimizing the quadratic 
form: 

•n l+l n i+ l i ti+ l 

(23) a" 52 l’ / (Fu — £u)’ + 2a- 1 " 52 r'(xu — £n)(V 2 i — £ 21 ) + a 1 " 52 rl (Zu~ £ 2 () 2 , 

i=i i=i 1=1 


subject to the linear relations (3) Here (a' 1 ) is the matrix inverse to a,, On 
substitution of the estimates, expression (23), which is positive definite, would 
serve as a “sum of squares” of deviations from the line and theiefoie as a test 
criterion. This criterion is of course a direct generalization of the weighted 
sum of squares which is used when the errors are independent. 

Van Uven [14] gave an elegant method by which the sum of squares of devia¬ 
tions can be found directly, before solving for any of the unknown quantities 
In our notation he showed that the sum of squares of deviations is the smaller 
root H 2 of the equation 

(24) | o„ — IJa,, | = 0, 


where a„ is as before the treatments sum of squares or products. 

Suppose that in default of knowledge of the a,, we derive the weights from the 
sample estimates b„/n 2 ; i e , we minimize (23) with b ,; in place of a 1 , where 
(b 1 ’) = (b.j/wj) -1 . In this ease the method of Van Uven shows that the sum 
of squares of deviations from the best-fitting line is the smaller root H 2 of the 
equation 


(25) 


W, 

CLij Du 

712 


= 0. 


Comparing (25) with (21) we find H 2 = n 2 /i 2 . Consequently the least sepia,res 
approach, with sample weights substituted in (23) for the unknown true weights, 
leads to Jh as a test criterion. Further, Hsu’s [81 proof that the distribution 
of n 2 /i 2 tends to x with (n : — 1) degrees of freedom establishes for this case the 
standard least-squares lesult for the distribution of the residual sum of squares: 
—namely that when the population weights are known, the residual sum of 
squares is distributed as x, with degrees of freedom equal to the number of points, 
2{m + 1), minus the number of independent unknowns, (m + 3). By a trans¬ 
formation of the x-variates to independent variables, this result can be obtained 
alternatively from a theorem by Demmg [2], 



COMPARISON OF SCALES 


213 


8. Test for linear relationship—more than two scales. The extension of 
hypothesis (3) to the case of p scales can be expressed by means of the equations 

(3') + f3 = 7 ,- : (i = 2 , ■ ■ • p)(t = 1, ■ • • ni + 1). 

The equations, (p — l)(?ii + 1) in number, postulate a linear relation between 
xi and every other variate and consequently imply a linear relation between 
any pair of variates x , and x , 

Consider the variates v, = a t x u + , (i = 2, - ■ p) For iq we choose 

the linear function of the x’s which is independent of , ■ • • v p . Thus in equa¬ 
tion (16) for the population canonical correlations we have t/\, = 0, (i, j, > 2) 
and <7i, = 0, 0 > !)• It follows that all roots of equation (16) are zero except 
one, the non-vanishing root being p 2 = \pn/(^n + vcr n). If each treatment 
denotes a separate population, hypothesis (3') is therefore identical with Fisher’s 
hypothesis [4], that the populations are collinear. 

As a test criterion for this hypothesis Fisher has suggested the sum of the roots 
of equation (21), excluding the highest root, i e, V' = 2/i, = 2r 2 /(l — r 2 ). 
If > p the sum extends over (p — 1) roots, while if iq < p the sum extends 
over (% — 1) roots. For computational purposes it may be more expeditious 
to form this sum by subtraction Hsu [7] has pointed out that the sum of all 
roots is given by y — ^ V’a xl , which is obtained readily when the inverse of 

(1)„) has been calculated The largest root of (21) is then found and subtracted 
from V. 

Fisher [4] also suggested that when equations (3') hold, the distribution of 
V' is approximately that of x with (p — l)(nj — 1) degrees of freedom. This 
result has been confirmed by Hsu [ 8 } as the limiting form of the V distribution 
when n 2 tends to infinity. As in the case of two scales, the small-sample distri¬ 
bution is as yet unknown; it presumably contains p\, the non-vanishing correla¬ 
tion, as a nuisance parameter. 

Some progress towards the small-sample distribution can be made without 
difficulty in the case wheie pi = 1 . For then iq must have a zero Error sum 
of squares in every sample from the population, i e , tq is constant within any 
given treatment. Consequently (i) 6 U = 0 for i = 1, • p, and (ii) cq,/au 
is a single degree of freedom from the Treatments sum of squares of v, . On 
account of conditions ( 1 ), equation ( 21 ) reduces to 



an 

d\2 


dip 

(26) 

Oi2 

&22 — /it*22 

* * 

a 2 p — hb i 


Clip 

Q%p ^1^2 p 


a pp — hb . 


Subtract Oi,/a u times the first row from the *th row, for 2 = 2, • ■ p. We see 
that one root is infinite; the rest aie the roots of the equation 

(27) | a" - hb„ | = 0, 

where fl,, c,, ■ ai, {q, /n 


i,j = 2, • • • p, 
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If hypothesis (3') holds, the quantities a", follow the Wishart distribution [16] 
with (rq — 1) degrees of freedom. Hence the joint distribution of ht, • ■ - h p 
or fi ni , is that which is obtained when all the population canonical correlations 
vanish, with (iq — 1) in place of rq. For 7q > p, the distribution function 
(apart from the constant term) is: 

(28) {[ ^! (ni ~ p ~ I> (l + ^ i )-t<»>+">- 1 > | (h, - hi)J^ . 

For two scales, (p = 2), we reach the result mentioned in section 6, that V' = h* 
is distributed as (ni — 1 )e i ‘/n i . This result can also be obtained directly from 
(27). When p = 3, the distribution of V' is obtainable from a result by Hsu [7]. 


9. Measures of relative sensitivity. We propose to discuss briefly the esti¬ 
mation of the relative sensitivity of two scales and to indicate the types of 
distribution that are involved. If there are only two treatments, t, t', an ap¬ 
propriate definition of the true sensitivity of the tth scale is 


(29) 


2 an 


or some simple function of this quantity. In justification, we may observe 
that for a fixed number of replicates, the power function of the t-teBt in the ith 
scale depends entirely on this quantity. An unbiased sample estimate is 


(30) 


(«i - 2)(£,r - Su? 1 

2b,i r' ’ 


where r' is the number of replicates. Since (30) involves a non-central variance 
ratio, confidence limits for the true sensitivity can be found from Fisher’s Type 
C distribution, Fisher [3]. 

It follows from (3) and (29) that if two scales are linearly related (including 
the case of equivalence) their relative sensitivity is constant for all treatment 
comparisons. For scale 1 relative to scale 2 the sensitivity is measured by 
$ a ni <x <r ij . 

If the scales are equivalent, apart possibly from a constant difference, this 
quantity reduces to ip = 022 /V 11 , for which F = bxi/b\\ serves as a sample estimate. 
A test of significance of the sample ratio and confidence limits for the true ratio 
may be obtained from Pitman [12], who showed that 

<“> G-0/4/(FF?' 

follows the distribution of a sample correlation coefficient from (iq 1) pairs 
of observations. In (31), r\i = bL/bnbw . The same procedure may be used 
whenever <x and (3 are known. 

^When a and (3 are unknown, a sample estimate of the relative sensitivity is 
b ba/a b u , where (aaq + bx/) is the discriminant function which corresponds to 
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the lower root of equation (21). We have not been able to reach the distribution 
of this estimate. Confidence limits for the relative sensitivity can, however, 
be obtained when ih is sufficiently large so that <rn and an may be assumed known. 
For in that case the problem reduces to that of finding confidence limits for 
0 s /a. Now if a, (3 are the true coefficients, the quantity 

Cl 1 du + 2*3/3flu; -f- (f Q 22 

a 2 bu + 2af3bn + (3 2 i>n ' 

follows the nie^/nt distribution. Any proposed values of a and fi which make 
(32) significant are rejected by the evidence of the sample. By equating (32) 
to the desired significance level of rii e tl /rh , we get a quadratic equation for the 
two limits of /3/a. The limits will not be narrow unless the treatment effects 
are large. 

If the relation between the scales is non-linear, and the assumption of a con¬ 
stant error variance throughout an individual scale is valid, the relative sensi¬ 
tivity differs for different treatment comparisons. Even in this event estimates 
of relative sensitivity may be of interest. Attention might be restricted to a 
single degree of freedom from the treatment comparisons, in which case the 
definition for two treatments could be applied. 

Alternatively ah estimate might be wanted of the average relative sensitivity 
over all treatment comparisons. For a given number of replicates, the' power 
function of the variance-ratio test of the treatment effects in the zth scale de¬ 
pends only on the quantity 





Consequently this quantity, which is an extension of (29), might be chosen as 
a measure of average sensitivity. The corresponding generalization of the 
unbiased sample estimate (20) is 


(34) 


(wt - 2) a,, _ 1_ 
nir'b" r '' 


Since the quantity a„/6„ is a multiple of a non-central variance ratio, the com¬ 
parison of two scales involves a test of significance of the hypothesis that two 
non-central variance ratios are equal. 


10. Summary. This paper discusses the analysis of data obtained when the 
results of a replicated experiment are measured on several different scales which 
we wish to compare. Recent work in multivariate analysis provides tests of 
the hypothesis that the treatment effects are the same in all scales, and of the 
hypothesis that the scales are linearly related. When the number of Error 
degrees of freedom is large, the significance levels of these tests are obtainable 
from the standard tables. For small sample tests, further investigation and 
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tabulation of certain distributions mil be needed, particularly that of the sample 
canonical correlations when one population correlation differs from zero. 

A brief discussion is given of methods for comparing the relative sensitivity of 
two scales. 
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ON STOCHASTIC LIMIT AND ORDER RELATIONSHIPS 

By H. B. Mann and A. Wald 1 
Columbia University 

1. Introduction. The concept of a stochastic limit is frequently used in 
statistical literature. Writers of papers on problems in statistics and probability 
usually prove only those special cases of more general theorems which are neces¬ 
sary for the solution of their particular problems. Thus readers of statistical 
papers are confronted with the necessity of laboriously ploughing through de¬ 
tails, a task which is made more difficult by the fact that no uniform notation 
has as yet been introduced It is therefore the purpose of the present paper to 
outline a systematic theory of stochastic limit and order relationships and at the 
same time to propose a convenient notation analogous to the notation of ordinary 
limit and order relationships. The theorems derived in this paper are of a more 
general nature and seem to contain to the authors’ knowledge all previous results 
in the literature. For instance the so-called 5-method for the derivation of 
asymptotic standard deviations and limit distributions, also two lemmas by 
J. L. Doob [1] on products, sums and quotients of random variables and a 
theorem derived by W. G Madow [2] are special cases of our results It is hoped 
that such a general theory together with a convenient notation will considerably 
facilitate the derivation of theorems concerning stochastic limits and limit dis¬ 
tributions. In section 2 we define the notion of convergence m probability and 
that of stochastic order and derive 5 theorems of a very general nature, Sec¬ 
tion 2 contains 2 corollaries of these general them ems which have so far been 
most important in applications. 

We shall frequently need the concept of a vector. A vector a ~ (a 1 , • ■ ■, a) 
is an ordered set of r numbers a 1 , • ■ , a r . The numbers a 1 , • •, a are called the 
components of a. If the components are random variables then the vector is 
called a random vector. 

We shall generally denote by a, b constant vectors by x, y random vectors and 
by a 1 , • • ■, a r , x l , ■ ■ - ,x r their components. Differing from the usual practice we 
shall put | a | = (| a 1 1, • • •, | a |) and we shall write a < b or a < b if a' < b' 
or a* < b l for every i This notation saves a great amount of writing, since all 
our theorems except theorem 4 aie valid for sequences of any number of jointly 
distributed variates. 

We shall review here the ordinary order notation. In all that follows let/(JV) 
be a positive function defined for all positive integers N. 


1 Research under a grant-in-aid from the Carnegie Corporation of New York 
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We write 

a N = o[f(N)] if lim a N /f(N) ~ 0. 

K-*K 

a„ = 0[/(A r )] if | a N \ < Mf(N) for all N and. a fixed M > 0. 

atf = U\f(N)] if 0 < M'f(N) < | a„ \ < Mf(N) for almost all N and for 

two fixed numbers M > M' > 0. 

a„ = oj[/(A r )] if 0 < Mf(N) < | a K | for almost all N and a fixed M > 0. 
For instance, log N = o(N') for every e > 0, or sin N/N = 0(1/1V), 3+4- 
2V/(4 + 8 VN) = Cl(\/N) 5/sin N = w(l). 

For any statement V we shall denote by P(F) the probability that V holds. 

2. General theorems on stochastic limit and order relationships. 

Definition 1. We mite plim xn = 0. (In words x N converges in probability to 

0 with increasing N) if for every e > 0 lim P( | | < t) = 1. Further plim 

x N = x if plim (xn — x) = 0. 

W—M3 

Definition 2. We write x K = o p [f(N)} (x N is of probability order o[f(N)]) if 
plim xu/f(N) = 0. 

Definition 3. We write x v = O r [f(N)] (x M is of probability order 0\f(N)]) if 
for each t > 0 there exists an A, > 0 such that P( | xn | < A,f(N)) > 1 — e for 
all values of N. 

Definition 4 x :V ~ O p [f(N)} if for each t > 0 there exist two numbers A, 
> 0 and B. > 0 and an integer N t such that P[A,f(N) < | x N | < B,f(N)\ > 
1 — < for all N > N,. 

Definition 5. x H = u P [f(N)} if for every e > 0 there exists an A, > 0 and an 
integer N, such that P[A,f(N) < | x N [ ] > 1 — t for all N > N,. 

Let E denote a vector space. For any subset E' of E the symbol a C E' will 

mean that a is an element of E', 

Since P(x C Ei & x C Ei) > P(x C E x ) — P(x <£ Ef) we evidently have 

Lemma 1. If P(x C E;) > 1 - P(x C Ef) > 1 — then P(x c: E \; 

i C ft) > 1 - < - 

We now put 0 1 = o, O 2 = O, O 3 = 0, 0* = u. 

Theorem 1. For every e > 0 let j/? A -(e )} be a sequence of subsets of the r-di- 
menmnal Cartesian space such that P(xv C ZS iY («)) > 1 — efor all N greater than 
a certain integer N, . Let {g?r(x) | be a sequence of functions of x = (a: 1 , x 1 , ■ • • x r ) 
such that gN{a N ) = 0'[f(N)] for any t > 0 and for any sequence (a w ) for which 
a„ C R„(t). Then we have g^(x N ) = 0' r [f(N)]. 

Proof; For i = 1, 2, 3, there exists a positive integer N, such that | g#(a) | is 
a bounded function of a in R.v(e) for N > ft,. For otherwise vve could construct 
a sequence (a*) with a K in R,w(i) such that | g.w(a v ) | > Mf(N ) for any M and 
for infinitely many values of N which contradicts the hypothesis of our theorem. 
Hence there exists an N, such that for N > N, the function | g.v(a) | is bounded 
in R K (t). Let M N (s) be the l.u.b. of | g N (a) \ ff(N) in Rn(i). We can construct 
a sequence (a*) with a N C R K (e) such that | gx(a N ) | /f(N) > M n (e)/2 for all 
N > Nt. Hence for i = 2, 3 the sequence M N (e) must be bounded and for 



STOCHASTIC LIMIT 


219 


i = 1 we must have lim My(i) = 0. Let M (e) be the l.u.b. of MJt). For 

i = 3, 4 one shows in exactly the same manner the existence of a g,l.b. M(«) of 
| Qn(&) | /f(N) if a Cl R N (e) and for N > N,. Hence for sufficiently large N 
we have 

P[ I gn(x N ) | < M N (e)f(N)\ >l-e with lim My(t) = 0 for i = 1, 

AT-*oo 

P[ | gM | < M(t)f(N)] >l-e for i = 2, 

P[M(t)f{N) < | g. v {xy) | < M(t)J(N)} >1-6 for i = 3, 

P[M( ( )f(N) < | g„(x N ) | ] > 1 - 6 for i = 4. 

For i = 2 the existence of an M'(e) such that P[ | g»(xy) | < M'(t)f(N)} >1 — 6 
for all N follows easily from this result. Hence our theorem is proved. 

Corollary 1. If Xy = 0' P ’[f,(N)] for j = 1, 2, ■ ■ ■ , r and {ii w (e)| is a se¬ 
quence of subsets of the k-dimensional space y\y 2 , • y k such that P[y N C Ry(t)\ > 
1-6 for sufficiently large N, and if [gytf, x\ ■ x r , y\ y\ ■ y \)) is a sequence 
of functions of as 1 , z, • • • x r , y\ y 2 , • • • y k such that for any t > 0 we have gy(a K , by) 
= O'lf(N)] for every sequence {a*, by] with a ] N = O' 1 [/,(#)] (j = 1, 2, • ■ ■, r) and 
by C Ry(t), then g N {xy , y N ) = 0' p [jf(^)]. 

Proof: It follows from Lemma 1, the definition of the relation x’y = O'J [ fj(N)] 
and the hypothesis of our corollary that for any t > 0 there exists a sequence 
of subsets {fty(e)j of the space x 1 , • ■ ■ , x r , y\ ■ ■ • ,y k which satisfies the condi¬ 
tions of Theorem 1 with respect to the sequence of functions [g N }. Hence 
Corollary 1 is an immediate consequence of Theorem 1. 

Corollary 1 implies inter alia that all operational rules for the ordinary order 
and limit relations are also applicable to stochastic limit and order relations. 
For instance o[/(A^)]/fi [ff(iV)] = o[f(N)/g(N)]. Hence also o p \f(N)]/il p [g(N)] = 
oMN)/gm 

Definition 6. For any N let By be a region, fy(a) a function defined on Ry. 
The sequence [fy(a )) mil be said to be uniformly continuous with respect to {Ttjv) if 
the following condition is fulfilled. For every e > 0 there exists a vector 5 > 0 
such that for almost all N 

| f N (a -f 5) — f N {a) | < e for any | <5 j < 5, and for any a C Ry 

Theorem 2. Let plim (x N — y N ) = 0. For every 6 > 0 let {&(«)! be a se- 

y-**o 

quence of subsets of the r-dimensional vector space such that for almost all N we have 
P[y N C Ry(i)} >1 — 6 . If the sequence of functions (/jv(o) } is uniformly con¬ 
tinuous with' respect to (Pjv(e)l for every t > 0, then plim [/y(a;y) — fs(yy)] = 0. 

N-*oo 

Proof; We have f N {x N ) — fy(yy) = fy(yy + Zy) - fn(yy) where z' N = o(l) 
for j = 1, ■ • • , r. Because of the uniform continuity of fy{a) with respect to 
Ry{() we see that for every sequence [a *., bjy} with ay C Ry(e) and by = o(l) 
0 = 1 , 2, ■ > • , r). 


fy(ay + by) — /y(tty) — o(l) . 
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Hence Theorem 2 follows from Corollary 1. 

In the following we shall abbreviate “cumulative distribution function” by d.f. 
Definition 7. Let {a,y} be a sequence of random variables. Let F N be the d.f. 
of Xtr ■ Let x have the distribution F. We shall write d°°(xx) = d(x) if hrn F N 

N—*ec 

= F in every continuity point of F. 

Theorem 3 Let plim (x, v — y N ) = 0 and d °° (yf) = d(y), then d co (x N ) = 

2V-+oo 

d{y). 

Proof: Let Gn , F N be the d.f.’s of x N , y N resp. For any 5 > 0 we have 
P{Vn < a + 5) > P(xh < a; y N < a + 8) > P{xn < a; | iJn — %n | < 5) 

> P(x n < a) - P( | y N - x s | > 5), 
P(x N < a) > P(x N < a;y N < a — 8) > P(y y < a — 8) 

— P{ | njjy- — | > 5). 

Hence since P(y K < a) = F N {a), P(x„ < a) = Gv(a), lim P ( | — y N | > 8) — 

0 we have lim. sup. Fn(a + 8) > lim, sup G^[a) > lim. inf. 6 +(a) /> lim, inf. 
F„(a — 5), 

If a + 5 and a — S are continuity points of F we have 

F(a + 5) > lim. sup G N (a) > lim. inf. G+(a) > F(a — 8 ) . 

For any 5 0 > 0 theie exists a positive 8 < o 0 such that a — S and a + 8 are 
continuity points of F. Hence we can choose 5 arbitrarily small and if a is a 
continuity point of F we must have 

lim. Gtr(a) = F(a). 

Theorem 4. Let x N , Un be two sequences of one-dimensional vectors and let 
plim (xn — ys) = 0. Let F \, G N be the cumulative distribution functions of 

J V—* oo 

x N and y N respectively. Let be the set of points a for which | F N (a) — G N (a) \ 
> e. Let M N (e) be the Lebesgue measure of this set. Then lim M N (e) = 0 

N —>oO 

for every * > 0 . 

We first prove the following lemma 

Lemma 2. Let 8, e be any arbitrary positive numbers and let f be a distribution 
function. The set of points a for which f(a + 5) — f[a) > «has at most the Lebesgue 
measure 5/e 

Proof: The points a for which/(a + 5) — f(a) > e must have a lower bound a. 
Otherwise we could find infinitely many such points whose distance from each 
other is more than 5. But this contradicts the requirement that /(<*>) = 1 . 
Let ai be the g.l.b. of the a’s. Then for any 17 > 0 in the interval (a x < x < 
01 + 8 + 17 ) the value of F incieases at least by the amount e Let now a 2 be 
the g l.b, of the a’s outside of this interval We continue our construction by 
constructing the interval (a 2 < x < a 2 + 8 + 77 ) and so forth But after at most 
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1 /e such steps the construction must stop. Hence all points a for which /(a + 6 ) 

— f /0 — ^ are contained in at most 1/e intervals of length 5 v• Hence since 
ij was arbitrary the Lebesgue measure of this sot is at most 6 /e. 

We come now to the pi oof of oui theorem. We have 

P(xk < «) > P(xtf < a; y N < a + 5) > P(xy < a) — P( | — y N | > 5), 

P{y N < a + 5) Si P(pii ^ n:!/» < a + 8) > P(j/at < a + 6) 

P (I x„ Vn\ P) P(ffl ^ ^ q + 26). 

Therefore 

P(:tjv < a, 2/v < U + 5) = P(z* < a) - 0*P( | | > 6) 

= P(n* < a + 6 ) - 6 *P( | Xjt - y K | > S) - e' N P{a ^ x„ < a + 26), 

where 0 < d N < 1, 0 < d'y < 1. Hence 
P(Vn < a + 6 ) = P(x N < a) + d N P{ | x N - y N | > 6 ) 

+ On[P n{ a H* 25) — Ftf(a)] 

where | 0y |, | Bn | < 1. 

By hypothesis we have P( | a. v - Vn | > 1/m) < 1/m for almost all N and 
every integer m Hence we can choose a sequence {5^} with > 0 in such a 
way that lim Sy ~ 0, lim P( \x N — y N \> S N ) = 0. We can then choose N, 

N—*oq JV—*oo 

so that P( | x N — Vn I > S N ) < </3 for N > N t Applying Lemma 2 we see 
that except for a set of measure at most 6 S N /e we have F N (a + 2 o N ) — P. Y (a) < 

e/3. Similarly the set of points for which g N (a + S N ) — gy{a) > e/3 has at most 

the Lebesgue measure 3 S N /« Hence, except in a set of points whose measure 
is at most 9 S N /e, we have 

| Gn{cl) — F N {a) | < 6 , 

and this completes the proof of Theorem 4 
Theorem 4a. Let plim (xy — y N ) = 0, Let F K , G„ be the distribution June- 

N—*ao 

tions of Xy , y N respectively. Furthermore, let Ry{e) be the set of points inside an 
r-dimensional cube where ( Fy — G N | > e and let My(e) be the Lebesgue measure 
of then lim M N (e) = 0 . 

/f—*oo 

We prove first 

Lemma 2a. Let 6 = (6 1 , S 1 , ■ • , S r ) > 0 and max. S' = d. Let I be the cube 
defined by ( — A < x' < A, % = 1, 2, • • • r). Let furthermore f be a d.f. Then 
the Lebesgue measure of the points ami for which f(a + 5) — /(a) > t is at most 
dr 2 A T ~ l Ji. 

Proof: Let ffix 1 ), / 2 (V), • ■ • f r (x r ) be the marginal distributions of x l , x 2 , 
x r respectively. It follows from Lemma 2 that the linear Lebesgue measure of 
those numbers o’ for which /,(«' + S’) — f,(a') > t/r is smaller than rd/e. We 
form the set (x l = a' & x CZ I) for every such a' and for i = 1,2, • • r. The 
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Lebesgue measure of the sum R{t) of all these sets is at most r 2 dA r- 1 /e. We 
shall show that R(t) contains all points a inside I for which/(o + 5) — /(a) > e. 
We have 


/(a 1 + 5 1 , a 2 + 5 2 , ■ ■ • , a r + S r ) — /(a 1 , a 2 , a r ) — Ai + A 2 + — + A r , 

where A ; = f(a\ a 2 , ■■■ a'~\ a { + «',•■ a r + 5 r ) - /(a 1 , • ■ ■ a\ a ’ +1 + 8 <+ \ ■.. 
a r + J r ). If /(a + 5) — J(a) > t then we must have for at least one i 


A, > t/r. 


But A, is the probability of a subset of the set T = (o‘ < x' < a { + 5‘) and 
/,(a* + S') — f,(a') is the probability of T itself. Hence 

t/r < A. < /,(a l + S') - /.(o'), 


and if (a 1 , a 2 , ■ • • o r ) is in / then it is contained in RU). Hence Lemma 2a is 
proved. 

The proof of Theorem 4a using Lemma 2a is similar to that of Theorem 4 and 
therefore it is omitted. 

The Jordan measure of a set R with respect to the distribution function F is 
defined as follows. We consider only intervals whose boundary points are 
continuity points of F. We cover R with the sum I of a finite number of inter¬ 
vals. (The intervals themselves may also be infinite. For instance the sets 
a < x < <*>, a < x < 00 are also considered intervals.) We consider M{I) = 

dF for every I covering R. The g.l.b of all such M(I) is called the exterior 
Jordan measure M (Ti) of R. Similarly we consider all sums I of a finite number 


of intervals which are contained in R, The l.u.b. of / dF is called the interior 
_ _ '7 

Jordan measure M{R) of R. If M{R) = M(R) then M{R) is called the Jordan 
measure of R. 

Lemma 3. Let F N (x) be a sequence of d.f.’s such that lim F N (x) = F{x) in every 

N -+*0 

continuity point of F(x). Let h(x) be a bounded function such that the discontinuity 

/•+00 

points of h(x) have the Jordan measure 0 with respect to F and such that I h(x) dF N 

00 

( 1 ) and / h(x) dF{x) exist. Then lim / h{x) dF N {x) = / h(x) dF(x). 

0C N-*X> J—oo 


Proof: There is only an enumerable set of hyperplanes parallel to the plane 
x = 0 which have positive probability with respect to F. Hence we can find 
for every S an interval net whose cells have a diameter at most 5 and such that 
the boundary points of every cell are continuity points of F. 


We first determine a closed finite interval I such that 1 dF(x) > 1 — * and 

J 1 2 

such that the boundary points of I are continuity points of F. We further 
determine a sum I' of a finite number of open intervals such that I' contains all 

discontinuity points of h, I dF(x ) < and such that the boundary of I' does 

"I* it 
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not contain any discontinuity points of F . All this is possible by hypothesis 
and because the set of hyperplanes with positive probability is enumerable. 
Let R be the subset of I consisting of all points of I which are not contained in I'. 
R is a closed set and can be decomposed into a finite number of intervals. The 
function h is continuous in R and therefore uniformly continuous. We can 
therefore cover R by a finite set of intervals such that the variation of h in every 
interval is less than e and such that the boundary points of each interval are 
continuity points of F. Let fa , fa , ■■ fa.be such a finite set of intervals. Let 
Xj be any point in fa . We have 


\Hh | = f Hx) dFy{x) - f h(x) dF(x) = E [ [h(x) - h(x,)} dF„(x) 

I «*~oo J-oo )_L j If 

- E f ih(x) - h(x,)\ dF(x) + Z h(x,)\ f dFu(x) - f dF(x)~\ 
l- 1 1-1 L*b, fa, J 

+ [ h(x) dFy(x) — [ h(x) dF(x) 

<t + * + it h(x,) £ J' dF N (x) - 


+ max. h(x) / dF N (x) + e 




But lim 

N—+oO 


!, 


dF„(x) >1 - e. 


Hence 


lim. sup. H n < 2t + 2e max. h{x ) . 

Since e was arbitrary, we must have lim R N = 0. 

N-* ao 

We are now prepared to prove 

Theorem 5. Let dx>(x N ) = d(x) Let g(x) be a Borel measurable function 
such that the set R of discontinuity points of g(x ) is closed and P(x C R) = 0 
Then d™[g(x N )] = d[g(x)]. 

Proof: Let F N be the d.f.'of x N , F the d.f. of x, F Ng , F g the d.f’s of g(x y ), 
g(x) resp. Then lim Fit = F in every cont. point of F. Let h{x) be defined 

00 

as follows: 


h(x) = 1 if g(x) < a, 
h(x) = 0 if g(x ) > a . 

The discontinuities of h are contained in the set M of all points where g(x) = a 
and is continuous or where g(x) is discontinuous. The set R of discontinuity 
points of g(x) is closed and of measure 0 with respect to F. We can therefore 
subtract from M a sum R* of a finite number of open intervals of arbitrarily 
small measure with respect to F which contains all discontinuity points of g(x). 
This difference set M 1 is closed and contains only points where g(x) = a and 
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x cf R. If a is a continuity point of F 0 then the Borel measure of M‘ with respect 
to F is 0. Since M' is closed, its Jordan measure is also 0 Hence the Jordan 
measure of the discontinuity points of h(x) is 0 if a is a continuity point of F 0 . 

f +oo -+oo 

h(x)dFt/(x ) = F No (a) and / h(x)dF(x ) 

co J— oo 

= F g {a) exist for every a. Hence by Lemma 3 lim F Ng (a ) = F e (a) in every 

JV—♦« 

continuity point of F 0 and this proves our theorem 

3. Corollaries and applications. Corollary 2. If plim (x K — y N ) ~ 0, 
dec {y K ) — d(y ) and if f is continuous except in a set Rfor which lim P(yn C R) 
= 0 then plim f(x N ) — f(y x ) = 0. 

If — 

Proof: Let 7 be a closed interval such that P(yn C 7) > 1 — t/2. Let 7' be 
a sum of open intervals containing all discontinuity points of f(x) in 7 and such 
that P (iJk C I') < c/2 for sufficiently large N. The set J of points of 7 which 
are not points of 7' is a closed set Hence f is uniformly continuous in J and 
P(yx C J) > 1 — e for sufficiently large N In Theorem 2 we put R»(e) = J, 
ft; = /. Then all conditions of Theorem 2 are satisfied and it follows that plim 

IfM - f(y „)] = o. 

If, moreover, the set of discontinuity points of f is closed then by Theorems 
3 and 5 doo\f(x N )] = d^[f{y N )] = d[f(y)}. 

Special cases of Corollary 2 have been proved by J. L. Doob and W. G. 
Madow ( 2 ), 

Theorem 5 is very useful in deriving limit distributions. 

It follows for instance from Theorem 5 that if do ° (zjv) = d(x), dco(y N ) = 
d(y), where i, y are independently and normally distributed with mean 0 and 
equal variances, then {xn/Vn) = d(x/y). That is to say the distribution of 
xy/yn converges to a Cauchy distribution. 

It also follows from Theorem 5 that under veiy general conditions the limit 
distribution of t = \/ A T (x — y)/s is normal, (x = sample mean, y = population 
mean, s z = sample variance.) For we have under very geneial conditions d » 
\/Nix — y) = d(f), plim s = u, where £ is normally distributed with vari¬ 
ance o'. 

Applying Theorem 5 it can also easily be shown that under very geneial 
conditions the limit distribution of T* is a chi-square distribution if the means 
of all variates are 0. Hotelling’s T 1 (the generalized Student ratio) for a 
p-variate distribution is defined as follows: 

T ' = N 2 £ where |[ A„ |j = || s„ ||“\ £< = a, 

t—1 /—I 

where s,, is the sample covariance between x' and 

We have d*> (A„) =_d(o-”)> where || <r x , || _I = || a’ ||. If E{x') = 0 for i = 
1 , 2 , • p then d ® (\/N ?,) = where the ij, have a joint normal distribution 
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with covariance matrix || a tJ || . Hence 

d * ( t 2 ) = d[£ e ,%*] = <*(£ 

where the v ! are normally and independently distributed with variance 1 . 
Hence the distribution of T“ converges to a chi-squaie distribution with p degrees 
of freedom 

If the samples are drawn from a sequence of populations {nw} all with the 
same covariance matrix and such that lim VAhn.v = m. where is the mean 

i N —*oo 

value of the ith variate in the Wth population, then one sees in exactly the same 
way that the limit distribution of T 2 is a non-central squaie distribution with 
p degrees of freedom 

The limit distribution of T 2 has been derived by W G. Madow ( 2 ). 
Corollary 3. Let x at, Un be r-dimensional vectors d^(y N ) = d(y) and xy — 
yy — O p [f(N)] with lim f(N) — 0. Let g(x) be a function admitting continuous 

A r —*°o 

jth derivatives except m a set R with lim P{ys C R) = 0. Let 

N —*oo 


T,(x, a) 


then 



(x' - a') +■■■ + 






g(Xn) - (j{y N ) - T } {x N ,y N ) = 


Since the jth derivatives are continuous except in a set of limit measure 0 
we can determine a closed set i?(e) on which they are uniformly continuous and 
so that P(ijn CZR(e)) > 1 — e for sufficiently large N. Then for every sequence 
with a N — by = by <Z R(e) we have 


— g(by) — T,(ci N , by) — o\f(N) J ]. 

Hence Corollary 3 follows from Theorem 1, 

Corollary 3 was first proved by W. G. Madow [2] and J. L. Doob [1] for the 
important case that yy is a constant. 

The following example will illustrate Corollary 3. Let x, y be normally and 
independently distributed random variables with mean 0 and variance 1 ; (zjy}, 
(z w ) sequences of random variables with plim \/N Zn = plim y/N Zy = 1. 

AT—*» N—*oo 

Let Xy = x + z N , y N = y + z N . We consider the function g{x, y) = ar/3 + 
2/7 3 + 2x — 2y -f- 5. Applying Corollary 1 it is easy to verify that g(xy ,y N ) — 
g(x, y) = njl/ViV], zy = 0 P (l/y/N), s'y = O p (l/y/N). Hence applying 
Corollary 3 for j == 1 we have 

g(x N , y N ) — g(x, y ) - {x 2 + 2)zy - (y 2 - 2 )z'y = o p {l/y/N) . 
Multiplying by -\/N we have 

[s(z*, yy) - g(x, i/)] ~ [fa 2 + 2 )zjv + ( y 2 - 2 )z'y] VN = 0 ,( 1 ) . 
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This is equivalent to 

plim[Vl V(g(x K ,y„) - g(x, y))] = x* + y 2 . 

Hence the distribution of sjN(g(xit, Vn) — g(x, y)) converges to the chi-square 
distribution with 2 degrees of fieedora. 

If plim Xu - a and [o>) is a sequence of numbers with lim <s N ~ 0 such that 

jv-^oo .y-*oQ 

d oo [(x' N — a')/an] = d(tO where the £, are constants or random variables and 
if g admits continuous first derivatives at x = a at least one of which is different 

from 0 , then putting ( ~) = o< , we have 

\OT7l-a 

gM - g(a) = g t (x l K - a 1 ) -f ■ • ■ + g T (x» - a r ) + o v { ffN ) . 

Hence applying Theorems 3 and 6 we have 

(i) d* j = d( gi fc + ■ • • + g r f r ). 

That is to say the distribution of {g{x N ) — g{a)]/an converges to the distribution 
of 0 i£. i n all continuity points of the latter. A corresponding result can be 

I —t 

obtained from Corollary 3 if all first derivatives are 0 at x = a and at least 
one second derivative is different from 0 and so forth. 

A method of deriving limiting distributions and limit standard deviations based 
on (i) is known as the 5-method and has been extensively applied in statistical 
literature. 
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ON A MEASURE PROBLEM ARISING IN THE THEORY OF 
NON-PARAMETRIC TESTS 

By Henry, Scheffe 
Princeton University 

1. Introduction. While the contents of this paper have broader statistical 
implications, they were motivated by the following problem: Given two samples, 
(Fi, Yi, ■ ■ • , F m ) and (Z\, Zz , • - • , Z„) from univariate populations with 
cumulative distribution functions (c.di’s) F(x) and G(x), respectively, and 
given furthermore that F and 6 are members of a certain class fi of c.d.f's, to 
test the hypothesis that F = G. We shall refer to this as "the problem of two 
samples” [8]. It is an example of what Wolfowitz has called problems of the 
non-parametric case [8]. 

For the theory of non-parametric problems the following classification of 
c.di’s is appropriate: Let fio be the class of all univariate c.d.f’s, that is, the class 
of all monotone non-decreasing functions F(x) for which F(— oo) = 0, 
F(+ °°) = 1, and F(x) ~ F(x.+ 0). For every F efi 0 we may Conceive of a 
corresponding random variable X such that Pr{X < a:) = F{x). For some 
purposes we may desire to rule out the class fi (0) of degenerate c.d.f’s given by the 
formula F(x) = 0 for x < x„, F(x) = 1 for x > , where x D is any real number. 

Let then fii be the class of non-degenerate c d.f's, fii = fi 0 - fi <0> . Let ik be the class 
of all continuous F(x), and let fi 3 be the class of all absolutely continuous F(x), 
that is, all F(x) for which there exists a probability density function (p.d.f.) 
f{x) -Such that 

(1) F(x) = f Z /(^ df. 

Finally, let be the class of all F(x) which may be expressed m the form (1) with 
f(x) continuous. 

Various solutions of non-parametric problems have been given under the 
restriction that the c.d.f’s belong to one of the classes fi,. For example, Kol- 
mogoroff [2] has indicated how a confidence belt for an unknown F may be 
formed with no assumptions on F, that is F e fio. Wald and Wolfowitz earlier 1 
gave a more general solution of the same problem [5], and also of the problem 
of two samples [6], under the restriction that the c.d f's are members of Sk ■ 
The latter problem was considered by Dixon [1] for the c.d .f’s in fi 3 . Wilks’ 
theory of tolerance intervals [7] assumes F e fi*. The class fij has been defined 
above because it is ordinarily the largest class of statistical interest. We note 

(2) fio ID fij ID fi 2 ID fi 3 Z) fi 3 . 


‘See, however, a still earlier paper by Kolmogoroff [11] in which he gave the distri¬ 
bution theory required for his solution. 
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It is to be understood throughout that the word “region” (also the symbol w) 
always denotes a Borel set in a k-dimensional (k > I) sample space W (Euclidean.). 
A “null sel” will always mean a Borel set of measure zero 
Returning now to the problem of two samples, let m + n ~ k, X, — Y, 
(i — 1, 2, ■ ■ • , m), X, = Z t_ m (i = m + 1, • , /c) Denote by E the point 

(X L , ■ • • , X fc ). Proceeding along the lines of the usual parametric theory, 
we may seek a region w (the “critical region”) such that Pr{E t w) is the same 
constant a (“significance level”; a ^ 0 or 1) for all F in a particular class ti, 
if F = G. This laises the following question: Define 

P(w | F) = f dF k (xi, ■■■ , x k ), 

J W 

where 

Fk(x i ,•••,**) = II 

,»i 

We shall say that a region w has the property n if for all F d2 ,, a = P(w | F) 
is independent of F and 0 < a < 1. The question then is, for a fixed i, how 
can we characterize regions w with the property 7r,? Partial answers to this 
question are given in the next section. 

In the language of measure theory the question is this: Let y be any measure 
on the real line, such that the measure of the whole line is unity, and form the 
"power” measure y in Euclidean /c-spaee—that is, the product measure obtained 
by using n on each axis. Foi certain large classes C\ (corresponding to the fi, 
defined above, i = 1, 2, 3, 4) of measures y, what can we say about the existence 
and structure of sets of points in the fc-space which have the property that 
their “power” measure is the same for all measures y in C,? 

2. Theorems. Our first theorem tells us that if we want regions w with the 
desired property, we must restrict F to a smaller class than fti. 

Theorem 1: There is no w with the property ir i . 

To prove the theorem, suppose the contrary. Then there exists a w for which 
P(w | F) = a for all F e Qi and a ^ 0 or 1. Let L be the line xi = xi = • ■ ■ = x k , 
and suppose first there is a point E 0 of L in w. Let E 0 = (a, a, • ■ ■ , a), and 
let F h (x) be any F ti2 i such that Pr[X = a \ F k \ — h (0 < h < 1). Then 

« = P(w | P A ) > P(Eo | Fa) = Frfall X, = a | Fa} 

= fl Pr[X, = a | f„} = h\ 

By hypothesis a is independent of h But h may be chosen arbitrarily close to 1, 
Hence a = 1, a contradiction. If no points of w lie on L, the above reasoning 
applies to w' = W - w, since a' = P(w' \ F) = 1 — a is independent of F t fl,, 
and w' contains an F 0 on L, therefore a' = 1, a — 0. 
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In older to see what kind of structure might yield a w of the desired type, 
let us,for the moment considei the class of c.d f’s Then there exists a p.d.f 
over W, namely f(x\)f(xf) • ■ f(x,.). For any f(x) and any point 2 E, this pdf. 
has the same value at all points E' whose coordinates are permutations of the 
coordinates of E. This suggests that suitable regions w can bo built up by 
considering points E for which no two coordinates are equal and putting a fixed 
fraction of the set { E'\ in w in such a way that w is a Borel set Oui next 
theorem justifies this process for the wider class % . 

Let us say that w has the structure S if for eveiy point E - (x i, - • , x k ) until 
no two coordinates equal, M points (0 < M < k\) of the set |E'}, obtained by 
permuting the coordinates of E, are in w and the remaining k* — M are not, 3 4 

Theorem 2: A sufficient condition that w have the property tto is that it have the 
structure S. 

In proving the theorem it will be convenient to separate the fc 1 points of 
every set [E'\ by means of regions u, (i = 1, • ■ , k [), such that each u, contains 
one and only one point of j E' j. Order the A;! peimutations of the integers 
1,2, ,k in any manner so that (1, 2, • • , k) is the first. Let (p.i, ■ • , p, k ) 
be the zth permutation (i = 1, 2, • ■ • , Ad) and define u, as the region x P)l < 
x Pt , < ■ ■ ■ < x Pa . The collection {u t } is disjoint and covers all of W except 
the set H of points on hyperplanes x, = x, (i ^ j) The tiansformation 7\ : 
x p,i ~* x i > ‘ i X i>xk x k maps u t onto Ui in such a way that F k remains in¬ 
variant. 

Suppose now that w satisfies the conditions of the theorem. The removal 
of H fl u; from w docs not 1 affect P(w j F) for any F t fl 2 Hence 




p(w | f) = E p(« n ^ | f) = 

i«=i 


V r 

E dF k 


— E I Cu, n u,(F) dFk , 

*=1 ^ u , 


where c B (E ) denotes the characteristic function of a set S, that is, c g (E) = I 
if E € S, 0 otherwise. Next map each of the regions u, onto ui by means of T\ . 
F fc is invariant, while c,„ n „ l (S) —> hfE) such that E>-i hfE) = M for E eui. 

Then 

P(w\ F) = E I ht(E)dF k = f E h,(E) dF k = M f dF k . 

1=1 J Ui 1=1 “ 


2 Previously E denoted a random point (X lt • • , A'*), now it denotes an arbitrary point 
(ii, ■ , x k ) m the sample space W. This will cause no confusion 

“Regions of structure S may be regarded as the result of applying R A Fisher’s 
randomization process [10] in the most gcneial possible way to the problem of two samples 
Special cases of regions with stiueture iS have been considered by Feller [9] and Neyman 
[12[, and are implied by all \viiteis [e g., 6] w ho have attacked the problem of two samples 
by the method of ranks 

4 This may be seen by writing P{H \ F ) in the form of an integral over W of cu(E) dF i , 
where is the characteristic function of the set H , and applying the Fubim theorem [4], 
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But 


1 = P(W\F) = £ f dFk, 

t-1 Ju, 


and by use of T t we find 


Hence 


and 


f dFk — f dFk 

J t* f J u i 

f dF k = 1/A;!, 

Ju t 

P(w I F) = M/k\ 


(*-!,■■■, k\). 


for all F eti 2 . Thus w has the property n ■ 

H is an example of a set in the class N 2 of regions w for which P(w \ F) = 0 
for aU F «0 2 . Since if regions w x and w 2 differ by a set w e N 2 , P(w 1 1 F ) = 
P(w 2 1 F ) for all F «f) 2 , we have 

Corollary 1 : It is sufficient that w have the property 7r 2 if it differs from a region 
with structure S by a region in Ni . 

Defining similarly the class N 3 as that class of regions w for which P(w | F) = 0 
for all F «, we see that N a is precisely the class of null sets. 

Corollary 2: 4 sufficient condition that w have the property 7 r 3 is that it have 
the structure S except for a null set. 

The mildest restriction under which the writer has been able to concoct a 
necessity proof is that the boundary of to be a null set. This class of regions w 
includes (to the best of his knowledge) all critical regions heretofore used in 
practice. 

Theorem 3: For a w whose boundary is a null set, a necessary condition that 
w have the property 7 r 4 is that it have the structure S except on a null set. 

Suppose then that w has the property tt 4 , and its boundary B is a null set. 
Let jB, be the transform of B under T,. Let the null set H' be the union of H 
with all B , and let w 3 = w — H’, wi = (W — w) — H'. Then wi and w 3 are open 
sets and P{wi \ F) = P(w \ F) for all F e fi 4 . Furthermore for any E either 
all or none of the points of [E'\ are in Wi U u > 2 . Now consider any Eo eWi 
and let Mo be the number of points of [Fo} in Wi, so that k\ — Mo of (F'ol are 
in Wi. Let E 0 — (£ 1 , • ‘ ‘ , £*), an d 25i = min | £,• — f, | for i 5 ^ j. Since Wi 
and are open, cubes with sides parallel to the coordinate hyperplanes (Xj = 
constant) and edges of length 25 2 may be centered on the points E' a so that each 
cube is entirely in u>i or entirely in w 2 , by choosing S 2 sufficiently small. Choose 
5 so that d > 0, 5 < h , 5 < 6 2 . The set {F£} is a subset of the set {Eo) of 
fc* points whose coordinates are in the set £ 1 , • ■ • , £1 allowing repetitions. For 
each point Eo = (£., , , £, t ) in [Eo] construct a cube C H , .,, t as above 



NON-PARA METRIC TESTS 


231 


with center at E a ' and edge 25. These cubes are disjoint. Let/, (x) be a p.d.f. 
such that the corresponding c.d.f. is in fi 4 and /,(:c) = 0 for | x — f, | > 8 (i = 1, 

• • • , k). Define the p.d.f. 

= s' 1 Z/. (a:) (s = 1, ,k). 

1-1 

Then the corresponding c d.f. F M is in 0 4 . We have 
a = P(w | F w ) = f dW 

" V} J — l 

= S~ k [ z ■■■ f, k {x k )dW, 

where dW = dx i ■ ■ dz* . Bring the last summation sign outside the integral 
sign, and note that/ M (ah) • ■ f* k (x *) = 0 outside C„,. ., u . Then 


(3) 

where 

(4) 


= s k «. 




= f /h(*i) ■ ■ • /.*(**) dW. 

‘'“TIC,,, 


Our argument depends on certain sums of 7 tl , ,, t having the property that 
the sum is equal to « times the number of terms in the sum. In order to save 
space we shall say that if 2 is such a sum, then 2 e R, R being the class of such 
sums. Clearly all sums (3) are in R. Let {S r „j be the subsets of r (r = 1, • ■ ■ , 
k) different integers in the set 1 , 2 , • • • , k (v = 1 , • • • , kC r ), and let 2 r , be the 
sum of all , n for which the index , • • ■ , t* consists only of integers in 
$ r „ and such that all the integers of S„ appear in the index. We wish to prove 
that , the sum of I for cubes centered on the points of j£o), is in R. To ac¬ 
complish this we make an induction on r: If we assume all 2 r , e R for r < s, then 
we chn show all S v e R (s = 2 , • • , k). No generality is lost in taking S,„ as 
the set of integers 1 , 2, ■ • ■ , s. Now consider the left member of (3). Some 
thought will show 6 that it may be broken down into plus a sum of 2 r „ where 
r < s. But the left member of (3) is in R, and by hypothesis so are all 2™ with 
r < s. It follows that 2 ,^ is also in R. To see that 2 i„ t R (v = 1, • • • , fc), let 


s <To illustrate the reasoning, suppose s = 4. If Srr is the set of (different) integers a, 
6, • ■ , h, denote by <a, b, ■ ■ , h>, that is, <a, b, ■ ,h> is the sum of all I whose 

indices contain a, 6, - • , h and no other integers. Then the right member of (3) contains 
terms from <1, 2, 3, 4>, <1, 2, 3>, <1, 2, 4>, <1, 3, 4>, <2, 3, 4>, <1, 2>, <1, 3>, 
<1, 4>, <2, 3>, <2, 4>, <3, 4>; <1>, <2>, <3>, <4>. Every term of the right 
member of (3) is in one of these sums < > No term can appear in 2 sums < >. Every 
term of each sum < > appears in the right member of (3) Thus the right member is the 
sum of all sum's < > listed above, and by hypothesis, all but the first sum < > are in R 
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Si, be v and note that Si, consists only of Putting s = 1 m (3) wo 

have I ,i = a, and likewise 2 lK = ,„ = a Thus e It. 

We have at this stage that 2n = k 'a But as we already noted, of the cubes 
C associated with the integrals / m the sum 2 m , M 0 arc entirely inside wi and 
k I — Mo entirely outside ivi . For the set of Mo terms in 2 ; a corresponding to 
the cubes C in Wi the region of integration w fl C in (4) is actually C, and for the 
remaining set of terms m Su the region of integration is the empty set Further¬ 
more if w fl C = C in (4), the correspondrng I is unity. Hence 2 m = M 0 = /da, 
a = Mo/k\. If we now repeated the process with any other point E t ew x in¬ 
stead of Eq , and let Mi be the number of points of [E 1 } in Wy , we would get 
a - Mi/kl Therefore Mi = Mo. FromO < a < 1, we conclude 0 < M 0 < hi. 
Thus Wi has the structure S. 

The exceptional null set allowed for in the statement of Theorem 3 entered 
-the proof when we removed wO H' from w. Had we assumed that the boundary 
B t N 2 , then the exceptional set would be in . As a corollary to the reasoning 
used in the proof we thus get 

Corollary 3: If the boundary of w is in Nz , a necessary condition that w have 
the property ir* is that w have the structwe S except on a subset %n 

Finally, because of (2), any sufficient (necessary) condition for w to have the 
property ir, is sufficient (necessary) for w to have the property tt, if j > 1 (j < 1 ). 
Hence we may replace n in Theorem 2 and Corollary 1 by ir 3 or ir 4 , ir 3 in Corol¬ 
lary 2 by in , Vi in Theorem 3 and Corollary 3 by v 3 or x 2 . This yields 

Corollary 4: If the boundary of w is a null set, a necessary and sufficient condi¬ 
tion that w have the property v 3 {or vf) is that it have the structure S except on a 
null set. 

Corollary 5: If the boundary of w is a region in Nz , a necessary and sufficient 
condition that w have the properly ir 2 (or ir 3 or vf) is that it have the structure S except 
on a subset in N 2 .. 

3. Remarks. Wald and Wolfowitz [ 6 , 8 ] in their work on the problem of two 
samples for the case F t ft 2 have imposed the following restriction on any statistic 
used to test the null hypothesis - The statistic must be a function of V only, 
where the sequence V of k elements is formed as follows: Rank the X, of the 
sample in ascending order of magnitude (ignoring cases where two X, arc equal), 
and if the i-th element in this rank order is a Y put the r-th element of V equal 
to zero, else unity. This means that the resulting critical region always consists 
of the union of s of the regions u , defined in section 2 , where s is a multiple of 
mini. The results of our section 2 show that this restriction is not necessary, if 
all we require is that Pr[E ew), where w is the critical region and E the sample 
point, be the same constant a whenever the null hypothesis is true. In fact a 
valid (but probably not very efficient) solution of the problem of two samples 
has been proposed by Pitman [3] in which the statistic is not a function of V only. 

Putting further requirements on the critical region will lead to a more restucted 
class than the class of regions having essentially the structure S. For instance, 
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from section 2 it follows that the significance level a can be any of the values 
ifk ] (i = 1, • • • , kl - 1) But if we lay clown a symmetry condition to the 
effect that if {iji , ■ ,y m ,z i, • , z n ) is m w, all points obtainable by permuting 

the y’s among themselves and the z’s among themselves be in w, then a must be 
a multiple of m'n\/kh Again, if we impose the condition that any statistic 
T(Xr, ' j A"*) used to test the null hypothesis remain invariant when all the 
X, arc subjected to the same topological transformation of the real line onto it¬ 
self, then Wald and Wolfowitz [6] have shown that T must be a function of V 
only, so that w has the special stiueture desciibed above. It would seem de- 
suable when the subject of statistical inference m the non-paiamctric case may 
be entering a stage of lapid development, to be clear about the assumptions 
necessary to restrict the critical region to a particular class. 

In concluding these remarks, we quote with the kind permission of Dr Wolfo¬ 
witz, from some correspondence with the writer. Important work has been done 
on non-parametnc tests under the restnction that the statistic used be invariant 
under topological transformation The following statement as to why this re¬ 
striction might be imposed will therefore interest the reader 1 “ • there are 
arguments pro and con • Pro. If the statistic be not invariant, this could 
happen: Two scientists working on the same problem and having the same 
observations to interpret might come to opposite conclusions if one used one 
scale of measurement and the other used a monotone function of that scale. 
Con. Tile criterion of topologic invariance of the statistic is a restnction on our 
freedom. Furthermore it cannot be imposed except m the univariate case 
([8], p- 270).” 
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FURTHER RESULTS ON PROBABILITIES OF A FINITE NUMBER 

OF EVENTS 


By Kai Lai Chung 
Tsmg Hua University, Kunming, China 

In a recent paper the author has generalized some inequalities of Fr6chet to 
the following: 

Let n 2t a S m S 1, and let 



AF(a) = F(a) - F(a + 1), A h F(a) = A(A'““ 1 F(o)); 

then 

AAa m) £ 0, A 2 A< m) 5: 0. 

Using a generalized Poincare’s formula, P. L. Hsu has improved these inequali¬ 
ties to the recurrence formula stated below. 

Hsu’s formula is 


« AA<"° = 

n — m 

Proof: We have 

Pm((a)) = S, (l - i ) 

For a fixed “a” summing over all (a) * (v), 

(L- !)(” i 0 s *««» 

{ n a O" *««'»} 


A 1 ?“ f w Probability of the occurrence of at least m events among n arbitrary events ” 

tZt°l h Slal ”7°! 12(1941) ' PP 328 " 338 ' We use throughout the same notation 

used in this paper, and that referred to in footnote 3 
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Applying the formula repeatedly, we obtain for 0 g h g n - a, 

*M>- = (“ + ™-')(»- ~y 4's". 

Since every A ^ 0, we have, for 0 S h g n — a, 

A\4i m) ^ 0, 

which includes my former results. 

Further, we may write (1) as 

(2) (» ~ a)Pl n) = (a + 1 - + mPj# 11 

or 

(a + l)Pi+l - (n - a)Pi m> = m (P&l - Pi# 1 ’) = wP&l 
It follows that 

(3) (a + l)PiS - (n - a)Pi ra) £ 0 
From (2) it also follows that 

(4) (n - a)Pa m) - (a + l- m)P%\ g: 0, 

which is the same as AAi”° & 0. Combining (3) and (4) we obtain 


n — a 
a+l 


3<m) 


^ Pi+\ 72 


H n 73 Cm) 

+ ■ - r a 

1 — m 


If we take the special case i/- = 1 and instead of the original events E x , •• • ,E n 
consider their negations, we easily obtain 

:+-“{(:) - ««>} s (:) - &««-»«^{(:) - ««>}• 

This is equivalent to a result given by Fr^chet 2 . 

There is an analogue of Hsu’s formula for P (m ), as follows: 

Let n S a m & 1, and let 

/n __ g[m] 

\a — 7 n) “ “ ’ 

then 

AP‘ ml = B'# 11 . 

n — m 


It follows that for 0 S h ^ n — a, 

A*P‘ m| S 0. 


! "Ev^nements compatibles et probability fictives,” C R. Acad. Sc., Vol. 208 (1939) 
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The other results on p m in the paper 1 also have analogues for P[ m j . For the 
result on conditions of existence see the authoi’s recent paper 3 . Here we shall 
state the following extension of Boole's inequality. 

For 21 + 1 g n — a and 21 g n — a respectively, we have 

E (-1 y ( m £ l ) ^ p w (W) g E (-1)’ ( m £ *) s m+ ,(W) 

Proof. We have 

Hence, 



*-*•«-» - s {s <- *>• (”» ’) (»i 0} >» 

- ?,.,«*» + g (”* + '*) ± (-!)• (?) »,.««,)> 

= pm(M) + g 3 1 )pi*+*i((*')) 


The inequalities follow immediately. 

Finally, we record two formulas which express p„(0)) in terms of Pi m] ((v)) 
and in terms of -PS."' 1 ((r)) for a fixed m and langing b’ s. Formulas which express 
PwCM) in both ways have been given 2 
We have, 

(^- 1 i) P((7 ))= £(- 1) ‘~ m E Pm«/3)> 

\' re V 4-" Cfl) « (7) 

Hence 

(m - l) = E E (-l) 6_m E Pm((/3)) 

V" V <T) « ( y) t-m m < (7) 

= E (-D 6 - m (^r E Pm (m 

6>-m \C 0/ (/3) < (*) 

By a generalized Poincare’s formula, we get 


p«(W) = e c-ir m e (-ir a f c ~ JV” “ ?V C " l ,V 

o-max{a,J) \Cl 1/ \C •— 0 J \7fl — 1 J 

= V ( _ ir -«+4-m /t-mWa-lV 1 p( «, 
k ' \» - Vm - 1^ n ■ 


Pl ,n> 


* On fundamental systems of probabilities of a finite number of events,” Annals of 
Math, Slat., Vol 14 (1943), pp. 123-134 
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Similarly we have 


:t)* 

P .(W) = t c- i) 1 -"| t (-iy /c ' 


[c-nmx(a,!i) 


U — 1/ vc — 5/ 



Dim] 


It remains to be seen whether the series in the curl brackets can be summed. 

Using a formula m footnote 3, we may obtain the desired formula in another 
way. We have, in fact, 

m 

0.(00) = £ Pm(00) 

o«*a 

n n 

= £ £ (-1)" 


i-c+6-m I b ffl\ / C \ 


<n - c/\m 


Prm 


c=a 6^>T7i+n—c 

-TmH £ .(-ir(' 6 „ _ ’"V!VVj* , («) 


b™wi 




— c/ 


+ £ m V c Wl" 

fc-m+»-«+i b- 4 \n-cj\m) J 


(M). 


The “complete” series 


c , »tn+n—b 




- !)“■ 

The “incomplete” series we denote by 

=e (-ir(i ::)(:)"= e (-i)'( 6 7)( n ;T 

Then we may write 

m+n—o / l\-l n 

P.(W)= £ -(?_:) Pl mI + £ a, 6, m)Pl ml . 

b=m 71 \0 t/ i=m+n—o+l 



ON THE PROBLEM OF TESTING HYPOTHESES 


By R. v. Mises 
Harvard University 

1. Introduction. The following is known as the problem of testing a simple 
statistical hypothesis The piobability distribution of a variate X depends on 
a pai ameter d. In the course of experiments each time a value x of X is observed, 
one pionounces one of the two assertions: equals do” or "d is different from 

d 0 ■” The first assertion is made when the observed value x falls in a “region of 
acceptance” A, the second, if x falls in the complementary region A. What is 
the chance of these assertions being correct and how can A be chosen to make 
this chance as high as possible? 

The distribution for the variate X is considered as given. Let P(x | A) be 
the probability of the value of X being -A x It is obvious that to know P(x | d) 
is not sufficient for computing the success or error chances of the above assertions. 
There is another distribution function Po(d) involved which we may call the 
initial or the a pnori or the over-all distribution of the parameter d. The 
meaning of P Q (d) is as follows. In the infinite sequence of trials there will be 
among the first N experiences N i cases where the assertion that the parameter 
v alue is A d proves correct Then P Q (d) is the limit of the ratio Ni/N when N 
tends to infinity. If No is the number of cases in which the actually pronounced 
assertions d = d Q or d do respectively, prove correct, the limit of N 0 /N is the 
success chance and of 1 — No/N the error chance of the test under consideration. 
It would not make any sense to assume that an error chance exists but the over¬ 
all chance Ro(d) does not, 1 

The success and error chances for the assertions d = do and d ^ do depend on 
both functions P{x | A) and Pn(d) But in most practical cases nothing or very 
little is known about the parameter distribution. Usually, only the limits 
within which d varies are known, or a set of distinct values is given which d 
can assume. Therefore, the problem of testing a hypothesis must be modified 
in the following way. We ask • What can he said about the error and success chances 
of the two alternative assertions and about the choice of the region of acceptance, if 
P 0 (i?) is entirely or partly unknown? This form of the question corresponds 
more or less to the conception generally adopted today. 

In section 4 of this paper a complete answei to the question is presented for 
the case of a parameter distribution that is entirely unknown except for the range 
of possible d-values. This solution, with the restriction to a parameter assuming 
distinct values only, was already given by Robert W. B Jackson in a paper 
devoted mainly to some genetical problems [1] The particular circumstances 
prevailing under the restriction to distinct parameter values will be discussed 

1 The expression “chance" rather than “probability” is used here since no randomness 
in required Cf the author's paper [2] p 157 
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in section 8. In section 6 the result is extended to composite hypotheses and 
m section 7 to problems in several dimensions. An important case of restrictions 
imposed to PoW is discussed in section 9. 

In the preceding lines the subject of testing a statistical hypothesis was pre¬ 
sented in its simplest foim, with one scalar variate and one parameter, m order 
to discard all non-essential complications which would serve only to veil the 
principal point For the same reason it is to be understood, in the following 
text, that region (in one dimension) will mean an interval or a finite number of 
intervals, and distribution will mean a set of concentrated values at distinct 
points with a continuous density in between or a continuous density throughout 
If, for the sake of brevity, a Stieltjes integral is used, nothing else is meant than 
the combination of a sum and an ordinary integral of a continuous function. 
With respect to the parameter the distributions P(x | d) are consideied as 
eithei defined for distinct revalues only or as continuous functions, etc. 


2. Error chance. Success rate. J Neyman who must be credited with 
successfully promoting many problems of mathematical statistics introduced 
the distinction between errors of first and second type and made this the basis 
of his approach in dealing with the theory of tests. An enoi of first kind is 
committed if the assertion § ^ is made when d equals i9o ; an eiror of second 
kind occurs when the assertion 6- = # Q pioves incorrect. 2 The chances P, and 
P n of these two events can easily be computed, if the distributions P(x \ >}) and 
P»W are considered as known. From P(x | tf) we derive the probability P(A \ 0) 
for a- falling in the region A In particular P(A | tf„) will be designated by 
1 - a. Thus a is the probability of x falling in A when , The function 

Po(t?) can have, at the point d =_t? 0 , a jump of magnitude t r 0 . The set of all 
d-values except t?o will be called H. Then the two error chances are obviously 


( 1 ) 


Pi = 


O!7ro 


Pn = f P(A I t?) dP„(i?). 
J (S) 


_ / 

By the integral over H is meant that the term P(A | i? 0 )ir 0 in the summation has 
to be omitted The formulae (1) show anew that it would be senseless to speak 
of error chances without assuming that an ovei-all distribution P 0 (i?) exists. 

In all papers that follow Neyman’s line of thought first and second type 
error chances are discussed. But the formulae (1) are seldom written down 3 
It is incorrect to say that a is the chance of a fiist type error and it is likewise 
incorrect to say that the chance of a second type error depends on §, it depends 
on the distribution of ■§. 

The total error chance is 


( 2 ) 


Pe — Pi T P ii — airo 


+ /_ 


P{A | tf) dP 0 (i5») 


1 See e.g, ref [4], [5] or various other publications by the same author 
3 They are included e g in equation (1) of A. Wald’s paper [51 
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and 1 — Pa is the success chance. If the distribution P(x 10), the region of 
acceptance A, and the test value da are given, P B depends on Paid) only. If we 
make Paid) coincide successively with all functions not excluded by some 
preliminary knowledge about the over-all distribution, there must exist a definite 
least upper bound (hub.) of P E since P E has the upper bound 1. The value 

5 = 1 - l.u.b. P* 

is the greatest lower bound of the success chance. In other words, for any 
positive t there exists a Paid) for which the success chance is S + e and S is 
the greatest number for which this holds true We therefore call S the sure 
success rate or, biiefly, the success rate for the test under consideration. If the 
success rate S' for a region of acceptance A' is greater than S, the test using 
A' will be briefly called preferable to that using A. 

Neyman’s approach consists in comparing two regions A and A’ with the 
same a. The difference pf the respective error chances P E and P' B is accoiding 
to (2): 

( 3 ) P E - P' M = f [P(A 113 ) - P(A' | 0 )] dPoW 

•1(H) 

This difference is non-negative, whatever is taken for Paid), if for all values of d 

(4) P(A | 0) PW I 0). 

In this case P E ig P' g and l.u.b. P B g 1 u b. P' E and therefore S g S'. If a 
region A' can be found for which (4) holds for whatever A , Neyman calls the 
test using A' a most powerful test In fact, this test has at least as large a success 
rate as any other test using a region of acceptance with the same a Neyman 
does not use the concept of success rate as introduced here, but imphcitly the 
success chance is the criterion underlying his analysis of tests. 4 

The theory of most powerful tests would supply a complete solution of our 
problem, if (1) a most powerful test existed in all cases, i.e. for all distributions 
P(x | 0) and all da ; and if (2) a sufficient indication how to chose a were given. 
Unfortunately it turns out that m almost no practical case a region A' of this 
kind can be found. The various substitutes for a most powerful test as proposed 
by Neyman and others (unbiased test, test of type A, etc.) need not be discussed 
here, since it is obvious that nothing can be said about the difference S — S', 
if (4) is not fullfilled for all A and 0. As to the choice of a, the expression 


1 This can be seen e,g from the justification of most powerful tests as given, by A. Wald 
[7] p 15-16. Moreover, the recommendation of a test with highest success rate as the 
"best” (which is not the purpose of the present paper) could be justified from the stand¬ 
point of the general theory developed by Wald [6] Wald introduces an arbitrary weight 
function for defining a ‘‘best’’ test If the error weight is taken as one in the case of a false 
answer and as zero for each correct answer, Wald’s "best” test coincides with the test of 
highest success rate, The present paper includes only statements that refer to the actual 
numbers of correct and false answers, independently of any arbitrary assumption about 
an error weight. 
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“level of significance” used by Neyman, leaves it open whether a high or a low 
value of a is preferable. 


3. Preliminary example. Before attacking the general problem the discussion 
of a very simple example may provide some information. Let the distribution 
of the variate X be given by the density 

(5) p(z 1 1 ?) = 1 + - !), 0 g x g 1. 

It is immediately seen that the integral of j> over the interval 0 to 1 equals 1 for 
each i? and that p ^ 0, if t? lies in the limits — x/3, V3. Let this be the only 
information we possess about the over-all distribution P <>(■&') The value to be 
tested may be d 0 = 0. The density for this parameter value reduces to 
p(rc | 0) = 1 and thus the probability of x falling within the interval xi , x 2 
equals x 2 — x \, if d = i? 0 . According to the notation introduced above we 
may consider as intervals of acceptance A all intervals with the limits X \, 
Xi + 1 — a, where 0 i ii i a. 

The function P(A | d) is now given by 

a 

p(x | d) dx 

(6) r a- ) 

= 1 - « + (1 - a )d* [xl + *i(l - a) - a - ( - T --- ) 

In particular, for the interval A' between 0 and 1 — a: 

(7) P(A' 1 1 ?) = 1 - a - (1 - a)d 2 a(2 - ~ - a) . 


The difference of these two expressions is non-negative: 

(8) P(A | d) - P{A' | *) = (1 - a)d\{x x + 1 - a) 

Thus the interval 0, I — a is seen to be a most powerful one. The error chance 
of this test is according to (2): 


(9) 


Pg — airo + f 1 — « — # Z (1 — a) ——n-^ 

•1(H) L " J 


w 


= airo 


+ (1 - a)(l - iro) - (1 - a) a(2 ~ — [_ ^ dPo(tf). 

O J (H) 


The last integral is non-negative and can approach zero indefinitely since the 
total amount 1 — iro can be concentrated at a point t? ^ 0 with d 2 < e. There¬ 
fore the l.u.b. of P' E for given a and iro is 

airo + (1 — a)(l — iro) 

On the other hand, this is a linear function of 7Tq which takes its extreme values 
at the ends of its interval, iro = 0 and to = 1. Thus the larger of the two values 
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a and 1 — a is the l.u.b. of P' E , if Po(d) is subjected to no further restriction. 
The success rate of the test under consideration is accordingly the smaller of the 
two quantities a and 1 — a. 

For a = 0.99 or a = 0.01 the success rate is 0.01 This means: If we use the 
most powerful test at a level of significance of either 99% or 1 %, we risk in both 
cases that 99 % of all assertions will be false. If a — J, tho success rate leaches 
its maximum value which is ^ too. On the other hand it can be seen that each 
interval of length \ with not too large Xi_ would lead lo the same success rate. 
In fact, the error chance P E for the interval .ti , £i + 1 — a is according to (9) 
and ( 6 ) 

Ps = corn + (1 — a)(l — tto) 

(9 } - (1 - - *«(*! + !-<*)] / (5) d 2 dPo(d). 

Therefore, the same reasoning as before applies, if the factor in brackets is non¬ 
negative. This is the case for a = § if the interval begins at a point 
X\ ^ K\/5 — 1) = 0.309 Among these intervals, that with a;* = 0 can be 
considered as preferable since its success chance for any P n (d) is at least as high 
as that of any other interval. 

Now, let us assume that in the definition (5) of P(x | d) the factoi d 2 is replaced 
by some function fif(d) which takes positive and negative values (within —3/2 
and 3) while d varies from — \/3 to \/3. Then equation ( 6 ) shows that for 
any two intervals of acceptance A and A' the difference P{A | d) — P(A' | d) 
changes its sign at least once with varying d. Thus no most powerful test in¬ 
terval exists. But, applying (9) and calling gi the (negative) minimum value 
of g(d) we find now 

— *i (xi + 1 — a)J (1 — tto) 

as the l.u.b. of the error chance of A' for given a and 7 r Thus the smaller of the 
quantities 

l-« and i-(i-«)[i- ffl !L 0 p!L)] 

is the success rate of the test using A'. If (71 is given we can find, by diffeientia- 
tion the value supplying the highest success rate Using (9') instead of (9) 
we find in a similar way the success rates for any other interval. It turns out 
that iS = -j for the mteival extending from the above given value Xi = 0.309 to 
0.809 

There aie three things wc may learn from this example. (1) It can happen 
that a most powerful test, at a high or at a low level of significance, has an 
extremely poor success rate; (2) In the case where a most powerful test with 
' the highest possible success rate exists, there may be othei intervals with the 
same success rate, (3) If no most powerful test exists, there is no need to look 


+ (1 — a)(l — tto) 


-«)[- 


(2 - a) 
3 
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foi some substitute definition; the success rate for any kind of test can be found 
independently of its being most powerful or not. 

* 4, General solution for a simple hypothesis. The distribution P(.v | d) of the 
vaiiate X, the parameter value i? u to be tested, and the set of all possible values 
of d are supposed to be given. The set of all possible ^-values except is called 
fl. Choose a region of acceptance A and compute first, for all d, the magnitude 

(10) P(A \d) = f dP{x\d). 

J (A) 

In particular, the value of this integral for # = t? 0 will be called 1 — a and its 
maximum value or its l.u.b. on H will be denoted by / 3: 

(11) P(A*1 1 >„) = 1 - a, l.u.b (if) P(A 1 1 ?) = (3. 

The chance of committing an error in asserting t? = da when x falls in A or 
d ^ do in the case x falls in the complement A is according to (2) 

P E = d!To + f _ P(A | (?) dPo(#), 

where to is the jump of P 0 ((?) at the abscissa d = i? 0 , or the a prion chance of 
i? 0 . The domain of integration over iff is (1 — to) and theiefore 0(1 — t 0 ) 
the l.u.b. of the integral Thus 6 

l.u.b. Pj, = max [ oan> + 0(1 — to)). 

As to can take all values between zero and one, the lowest upper bound of P s 
is either a or 0. The success rate S, i.e the greatest lower bound of 1 — P E , 
is consequently the smaller of the quantities 1 — a and 1 — 0. 

If the dislnbution P(x \ d) is given and a region of acceptance A for a test value 
da chosen, the success rate of this test equals the smaller of the two quantities 

(12) 1 - « = P(A 1 1 ? 0 ) and 1-0 = 1- l.u.b. ( S) P{A \ d), 

if nothing is known about the initial distribution of the parameter except its range. 
Finding a region of acceptance, A, mth the highest success rate , is then a simple 
maximum-minimum problem. 

This solution is not restricted to some rarely occurring type of distributions 
P(x 1(?) and it is insofar a complete one as it does not leave undetermined the 
value of a. Using Neyman’s terminology w ? e would have to say: The success 
rate is the smaller of the tw r o quantities: 1 minus level of significance and mini¬ 
mum power of the test. 

It follows from the definitions (12) that, if P(A | (?) is continuous in a d- 


6 This formula was given by Jackson [1] p 148 for the “case when the set of alternatives 
is discontinuous 1 ' Jackson calls the test with highest success rate a “most stringent test” 
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interval including do , and d is allowed to take all values of this interval, (i cannot 
be smaller than 1 — a: 

d ^ 1 — a or a + /3 ^ 1. 

Thus 1 - a and 1 — /3 cannot possibly both be greater than |. The greatest 
possible success rate is then \ and it can be reached only if a = P = We state: 
No test can have a success rale S greater than §, if d can vary in an interval including 
do without any restriction and P(A \ d) is a continuous Junction of d in this interval. 

We will see later, in sections 8 and 9, how certain restrictions imposed to 
Po(0) which are effective in some problems improve the success rate of a test. 

6. Examples. Let us assume that the variate X is normally distributed ac¬ 
cording to 

(13) P(x | d) = 4>[A(ai — i?)], 4>(u) = ~p= [ e~ z ’ dx. 

V 7T 

The paiameter value to be tested may be taken as d 0 = 0 without loss of gene¬ 
rality, since in all other cases X — do can be considered as the variate. If the 
interval xi , x 2 is chosen for the region of acceptance, we have 

(14) P(A | d) = <t>[h(x 2 - *)] - d>[h(xx - «?)]. 

The right hand side becomes a maximum, if 

4>'[h(xi - d)] = <t>'[h(xi - i?)], i.e. d = %(xx + x 2 ). 

Therefore, for d 0 = 0 

1 — a = tj>{hxi) ~ p = 2 — xx)) — ~ x 2 )). 

Both quantities have the value if and only if 

(15) Xi = — xi , <t>(hx i) = i, = f. 

These are the probable limits of x. The conclusion is that the probable limits 
supply the interval with the highest possible success rate S = 

The result is not restricted to the particular form of the function tj>, it remains 
valid, if <j> is replaced by any function whose derivative 4> r has one maximum 
and decreases both ways symmetrically. It is well known that this test which 
has always been used by statisticians and is here proved to have the maximum 
success rate, is neither most powerful nor even, for a general <j>, unbiased. We 
also see that the interval determined by (15) is the only closed interval with 
maximum success rate. 

Our method supplies the analogous solution for the case of an unsymmetric 
distribution also. Assume the density 

(16) p (x | tf) = f(x — d). 
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where f(u) is supposed to have only one maximum, say at the point u - 0 . 
The value to be tested may again be chosen as 0 O = 0. For the interval %, x 2 
as region of acceptance we have 

P(A | d) = f 2 /(* - 0) dx = P ° f{u) du. 

Jx l ‘'n-j 

The last expression becomes a maximum with respect to 0 , if 

/(*i - 0) = /(% - 0). 

The maximum will occur at the point 0 = 0 and accordingly coincide with 1 — «, 
if f(x i) = /(%)• Thus we have a region of acceptance with the highest possible 
success rate if %, % aie determined by 

f(u) du = /(%) = /(%). 

Under the assumptions made for /(u) there exists exactly one pair of values 
an, x 2 obeying these equations. This kind of test too has been much used by 
statisticians, but an account of its merits has so far not been given 
Another example is supplied by the density function 

(18) p(x | 0 ) = 0 z xe“'’ 1 , x £ 0 , 0 > 0 

We derive for an interval %, % 

P(A | 0) = f * p(x | 0) dx = (0% + l)e~ Jxi - ( 0 % -f- 

Jxi 

If 0 o is the value to be tested, we have 

(19) 1 - a = (0o*i + l ^ - * 011 - (0o% + l)e~ d ° H . 

One may ask for an interval % , % with the success rate S = % Then equation 

(19) must be fulfilled with a. = ^ and, moreover, P(A [ 0 ) must take its maximum 
value at 0 = 0o . This provides the second condition 

(19') 3P(A|0) = 0 at 3 = - 1Q xte'* 011 = xle^ 0Zl . 

ov- 

There exists, for each 0 o > 0 , one and only one pair of values % , % obeying the 
two equations (19) and (19')• 

In all these examples it turned out that at least one interval with the success 
rate S — § (the highest value for a distribution continuous with respect to 0 ) 
exists. It seems that this is a common property of most usual distribution 
functions P(x | 0 ). But we can easily give an example where the greatest S, 
at least for a single interval as region of acceptance, is smaller than Assume 

(20) P{x | 0 ) = x 0 x(l — a;)(20 z a; — 1), 0 g x g 1, 



-lg0gl, 
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and let d 0 = 0 be the value subjected to testing. For any interval beginning 
at x and extending to re + 1 — a we find 

P(A | d) = 1 — a. + ad + bS 3 with a = (1 — a) (2a: — a), 

(21) 

b = 2(1 — a) (— 3a; 4 + 3ax — a + a — x). 

It is a necessary condition for a test with S = \ —in the ease of a differentiable 
P(A | (J)—that the derivative of P(A | d) vanishes at d = d (1 . Thus we must 
have 

(22) *JW$. = a + 3 b& 3 =0 for d = 0. 

dd 

This shows that 2a; — a must, be zero or x — On the other hand, for a = 
x = j the formula for P(A 1d) becomes 

P(A[d) = i + T V 3 - 

Thus P has an inflexion point at d = 0 and its maximum, /3, must be greater 
than In the present example, as d goes up to 1, we have 0 ■= 11/16 and the 
success rate is S = 5/16. This does not exclude that intervals with a success 
rate between 5/16 and J exist. E g. for x ~ 0.45 and a = § one finds the maxi¬ 
mum £ = 0.60 and thus $ = 0 40. The optimum interval can be found by dif¬ 
ferentiating the formula for P(A 1d) with respect to x and a. 

Examples with the d restricted to distinct values will be discussed in section 8. 

6 . Composite hypotheses. We have the problem of testing a composite 
hypothesis, if instead of one value do a region H of d-Values is given and the 
asseitions to be made in the course of experiments are “d belongs to H” or 
“d does not belong to H.” The solution developed in section 4 applies to this 
case almost without modification. 

Again, let P(A | d) be the probability of x falling in the region of acceptance A. 
By A and H we denote the regions complementary to A in the sample space 
and to H in the d-space. Then the error chance is 

(23) P*= f (1 - P(A | d)] dPo(d) + f P(A | d) dP 0 (d). 

•'<«> J m 

This is an obvious generalisation of (2). The equation expresses the fact that 
each time x falls in A and d in H or x in A and d in H, an error is committed. 
Let us use the notations 


TTo = I dPo(d) 

J (H) 

a = I.u.b. of P(A | d) for d in H 
j3 = I.u.b. of P{A j d) for d in H 


(24) 
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Then the first of the two integrals m (22) cannot be greater than aT 0 and the 
second not greater than /3(1 — to). On the other hand no lower upper bound 
exists for either of these integrals, if to is given and Po(3) subjected to no other 
restriction. 

As to varies between 0 and 1, the expression 

cnro -f £(1 — iro) 

has its extreme values at the points to = 0 and to = 1 and these values are u 
and 0 Accordingly the gieater of the quantities a and (3 is the 1 u.b. of P s 
and the success rate S equals the smaller of the two quantities 1 — a and 1 — /3 
If P(A 11?) is continuous with respect to 3, we have again 0 S 1 — a, thus a 
and d cannot be both smaller than f and no S can become > |. 

If the hypothesis that 3 lies in H is tested by means of a region of acceptance A, 
the success rate of this test equals the.smaller of the huo quantities 1 — a and 1 — /3 
which are the minimum of P (A \ 3) for 3-values in H and the minimum of P (A \ 3) 
for 3-values outside H The task of finding the region A with highest success rate 
is thus reduced to a simple maximum-minimum problem. 

As an example let us take the density function 

(25) p{x | 3) = f{x - 3), 

where f(u) has a maximum at u — 0 and drops on both sides symmetrically and 
monotomcally towards zero. The hypothesis to be tested may be given as 

-b £3 g b. 

We find, if the interval Xi , x, is taken for region of acceptance: 

(26) P(A | 3) = Hfix - 3) dx= r f(u) du. 

Jxi Jxi— i) 

This function of 3 has its maximum at 3 = + xf) and drops symmetrically 

both sides. If + xf) is supposed to lie in the interval (0, 6) we find 

J f»xj +b px t~b 

f(u) du, P ~ . /(w) du 

z 1+6 •'xi—b 

Both quantities reach the value j, if we choose x 2 = — X\ = a and take for a 
the uniquely determined solution of 

Z a+6 j*a —b 

f{u) du = 1 f(u) du = 
a+6 J—a—b 

For this interval the success rate has its highest possible value \. 

7. Case of n variates and k parameters. The analysis given in section 4 for 
a simple hypothesis and in 6 for a composite one extends immediately to the 
case where instead of one variate X and one parameter 3 a group of n variates 
Xi , Xi , • ■ • , X n and a group of k parameters 3i , 3i , ■ ■ ■ , 3k are in question. 
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The region of acceptance A is now a portion of the n-dimensional sample space, 
determined by an interval of a function F(x i, x 2 , ■ • • x„). The hypothesis 
to be tested will consist in assuming that the point ■ At falls into a 

certain region H of the fc-dimensional parameter space. The success rate of 
such a test is again the smaller of the numbers 1 — a and 1 — (5 where a and (3 
are defined in exactly the same way as in the preceding section. The minimum 
of P(A 11?) when the tS-values fall into H is called 1 — a, and the maximum 
of the same function for all ^-combinations belonging to the complementary 
region S is /3. 

If the test function F(x\ ,*»,*••*„)» known, the interval with the highest 
success rate, can be found on the same lines as in the case of one variate. In 
fact, the quantity F takes the place of x in the former analysis. If the interval 
thus found has the success rate we know that no other test exists which would 
have a higher success rate as long as nothing is known about the a priori distri¬ 
bution in the parameter space. If a certain F(x i, x 2 , ■ • ■ x„) does not lead to 
an interval with success rate one may try another test function. In the most 
general case the test function F with the highest success rate would be found 
by solving the problem of calculus of variation that consists in maximizing 
1 — a and 1 — )3. As a rule such an elaborate analysis will not be necessary. 

To ask that a test be a most powerful one is too much and too little. It is 
too much since such a test does not exist in most cases. It is too little because 
there can exist another test (on a different level of significance) with a con¬ 
siderably higher success rate. The correct description of a most powerful test 
is that such a test can be shown, in a simple way, to have no smaller success 
chance whatever P 0 (A) is than a group of other tests. If a most powerful test 
exists, it may be considered preferable to all other tests of the same success rate, 
but there is no reason why it should be considered more favorable than any test 
with higher success rate. As to unbiased tests, and other substitutes for most 
powerful tests, nothing at all can be said about their merits as compared with 
that of other tests. 

A simple example for tests with the highest possible success rate in the case of 
several dimensions is the following. Assume a density function 

(28) T)(X | A) = f(Xi - , x t - & , • • • x n - A„) 

where f{ui, u 2 , ■ ti n ) depends on the absolute values | u 2 | , | % | , • • • | u„ | 
only and decreases monotonically with increasing u\ + ul + ■ • • w 2 „ in all di¬ 
rections. The parameter point Ai = A 2 = - • A„ = (his to be tested. Let 
F(x i,x 2 ■ • ■ x n ) be a function likewise depending on | x 2 1, | x 2 1, ■ • • | x n j only, 
vanishing at the origin, and monotonically increasing with x? + x\ + • • • x\ . 
Then the set of points for which 


(29) 


F{x x ,x 2 ,---x„) g C 
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is a region of acceptance with success rate §, if C is chosen in such a way as to 
have 

(30) / /(.%i ,x 2 , * • • ic fl ) dxi dx 2 • • • dx n = §. 

■>(F£C) 

This applies e.g. to normal populations. The proof is obvious. 

8. Distinct parameter values. Tests with higher success rate than § can be 
found, if the parameter t? is restricted to a set of distinct values. Take for 
instance our first example in section 3 and assume that t? can only take the 
three values 0, ±1. Then in the second expression (9) for the error chance the 
integral can not approach the value zero since the region R does not include 
the point & = 0. The minimum value of the integral is (1 — t 0 ) and thus 

(31) P' s ^ cxivo + (1 - a) [l - a(2 - ~ - a) j (1 - to). 

The success rate is the smaller of the two quantities 

1 - a and 1 - (1 ~ = 1 - 0. 

The best value of a is found by equating a and 0. This gives about a — 0 = 
0,436 and the success rate S = 0 564, for the region of acceptance a; = 0 to 
x = 0.564. Other intervals or sets of intervals can be examined in the same way 
A more impressive example is the following. We draw n = 12 times from an 
urn which contains three balls, black ones and white ones. The observed value x 
is the number of white balls drawn. The probability i? of getting a white ball 
in one experiment can have one of the four values 0, 1/3, 2/3, 1, and we want 
to test the hypothesis i? = do = 1/3. The probability distribution is given by 

(32) ir(% | &) = CV*( 1 - 

Let us choose the set of points x = 1, 2, • ■ ■ 6 as region of acceptance. Then 

(33) P(A I d) = £ Cl f(i - &y- x . 

1 

This sum can be computed for the 4 possible lvalues: 

P(A | &) = 0 0.926 0.178 0 

for 0 = 0 1/3 2/3 1 

Thus 1 — a has the value 0.926 and 0 equals 0.178 The success rate is the 
smaller of the two quantities 0 926 and 0.822, thus S = 0.822. If we restrict 
the region of acceptance to the points x = 1 to 5, the values of 1 — a and 1 — 0 
become 0.815 and 0.934, thus the success rate S = 0.815. In the first case we 
have more than 82% chance of making a correct assertion, whatever the a priori 
probability of # may be' 
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It is obvious that tin's result will become more and more strongly marked, if 
the number of observations increases. This is connected with the subject of 
the next section. 

9. Asymptotically increasing success rate. It seems strange that in the case 
of a continuously varying parameter and a distribution P(x 1 1 ?) which is con¬ 
tinuous with respect to d no test can have a success rate > One has the feeling 
that something might happen in the continuous problems similar to what was 
the case in the example of section 8 On the other hand our proof that S 
in sections 4 and 6, is conclusive and it applies to problems in more than 1 di¬ 
mension also. The answer is that in the kind of problems where a large number 
of observations is involved a definite restrictive assumption about the over-all 
distribution Po(«?) is silently introduced. 

The problems we have here in mind are connected with sequences of distribu¬ 
tions of the form 

(34) P„(x 1 1 ») = *»(* - i?), 

• 

where 4n(u), <fa(u), fain), • • ■ are cumulative distribution functions for distribu¬ 
tions more and more concentrated around one point, say u = 0. In a rigorous 
form the sequence 4>„(w) can be described by the following statement: For each 
«, rj > 0 exists a number N(t, rj) such that 

(35) <£„(ti) — <£„(—?;) s= 1 — « for n > N(e, ij), 

One wants to test the hypothesis 

-b g 0 g b, 

under the assumption that the 'parameter distribution does not depend on n. In 
this case, as we shall show, one can find for each e > 0 a region of acceptance A 
such that the success rate S n of the test corresponding to this A and to P n (x ] 0) 
is greater than 1 — e for sufficiently large n. 

We divide the. region R, i.e. 1 1 ? | > b, into two parts Hi and J? 2 where Hi 
consists of the points I d | g 5 + 2?; and satisfies the condition 

(36) f_ dPo(ff)Sf 
Then the region of acceptance will be 


— a = — b — 7?^x^& + 77 = a, 


and the probability of x falling in this region: 

(37) PAA | d) = ( p n {b + ?j — i?) — — b — i) — i}). 
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As long as d belongs to H the right hand side m (37) is not smaller than 
<j> n (n) - <£„(- v) and thus, according to (35) the error chance of first kind 


(38) 


P< n) = [ [1 - PM | d)] dPa{6) g 1 - [*„(,) - *„(-„)] g 5 

•1(H) 8 


for n >N 

The error chance of second kind can be written as 

(39) P ( n ] = [ Pn(B | d) dP Q ($) + f P n {A | d) dP 0 ( i?). 

J on) J (Hi) 


(§’”)■ 


The first of these integials cannot be larger than | according to (36) 

since P n {A | d) A 1. The second integral cannot exceed the maximum value 
of P n (A | d) for d in fh . But if | d | > b -(- 2 t? the two arguments of <j> n in (37) 
have always the same sign and are in absolute value greater than p. It then 
follows from (35), in connection with the fact that 4> n (u) increases monotonously 

from 0 to 1 , that the difference of the two ^-values cannot exceed | for n > 

N(e/ 3, i?). Therefore 

(40) Pj, n) < | | and S n = 1 - P\ n) - P\? A 1 - £ for n > 

This result has a wide range of application in the cases where a hypothesis 
is tested on the basis of a large number of independent observations. Consider 
a sequence of variates X\, X 2 , X 3 , • ■ subject to piobability distributions 
QM), Qi(M, QM), ■■ Let x - F(xi, Xi, ■■ x n ) be a statistical function, 
i,e. a function depending on the distribution of its n variables only, and d the ex¬ 
pected value of V. Then the general law of laige numbers states that the 
distribution of x has the form (34) with <j> n satisfying the inequality (35), if the 
Q„(x) fulfill certain conditions concerning mainly their behaviour at infinity . 
The proof of this theoiem which is the real souice of most “asymptotical” 
properties of statistical tests was given for the first time in 1936 The particular 
case where F is the arithmetical mean of the n variables Xi ,Xi, • x n has been 
known as Tchebychef’s theorem since 1867 

Applying this general law of large numbers we can now state the following 
fact. In testing a hypothesis about the expected value d of any regular statistical 
function of n variates uie can reach a success rate 1 — e, no matter how small t is, 
if the number n increases indefinitely and the initial distribution of d is supposed 
to be independent of n On the other hand, no test with a success rate greater than 
1 is available, if an assumption of this type is not used. 


6 For exact conditions see ref [3], 
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10. Summary. In this paper a solution of the problem of testing hypotheses 
/is presented in the following sense, It is assumed that a probability distribution 
depending on some parameters is given and that nothing is known about the 
initial distribution of these parameters. For any simple or composite hypothesis 
about the parameters and any region of acceptance chosen in the sample space 
the success rate S is computed, i.e. the minimum chance for getting right answers 
out of the test. From the formulae given for <S a test with highest success rate 
can easily be found in each case. 

This theory shares the point of departure with the actually used theory which 
leads to the concept of most powerful tests. A most powerful test is described 
as a test which, by simple reasoning, can be seen to have no smaller success 
chance than any other test on the same "level of significance” a. In the rare 
cases where most powerful tests exist for all a-values, one of them, with an 
a-value singled out by our theory, has the highest success rate and then is pref¬ 
erable to all other tests which might have the same success rate. In all other 
cases our method supplies a test of highest success rate in no relation to “un¬ 
biased” tests or other current substitutes for most powerful tests. 

Some of the mam results are; No test has a success rate >|, if nothing is 
known about the parameters except the limits of their values and if the 
given distribution is a continuous function of the parameters. The success rate 
can be higher, if the parameters are restricted to certain distinct values. A 
success rate no matter how dose to 1 can be reached in a sequence of tests based 
on an increasing number n of observations, if the initial distribution of the 
parameters is known to be independent of n. 

REFERENCES 

[1] Robert W, B, Jackson, ‘‘Tests nf statistical hypotheses in the case when the set of 

alternatives is discontinuous, illustrated on some genetical problems,” Slat 
Res. Mem,, Vol 1 (1936), p 138-161 

[2] R, v Mises, ‘‘On the correct use of Bayes’ formula,” Annals of Math Slat , Vol 13 

(1942), p, 156-165 

[3] R. v, Mises, “Die Gesetzc dor grossen Zahl flir statistischc Funktionen,” Monatsh 

Mathem u, Physik , Vol 43 (1936), p. 105-128 

[4] J Neyman, "Sur la verification dcs hypotheses statistiques composdes,” Bull. Soc 

Math de France , Vol, 63 (1935), p 246-266. 

[5] J. Neyman, “Outline of a theory of statistical estimation based on the classical theory 

of probability,” Phil Trans., Ser A, Vol 236 (1937), p, 333-380 

[6] A, Wald, “Contributions to the theory of statistical estimation and testing hypoth¬ 

eses," Annals of Math. Slat,, Vol 10 (1939), p 299-326 

[7] A Wald, “On the principles of statistical inference,” 1942, Notre Dame Led No 1 



ON THE RELIABILITY OF THE CLASSICAL CHI-SQUARE TEST 

By E. J. Gumbel 
New School for Social Research 

For a given set of observations and for a continuous variate, different classi¬ 
fications lead to different observed distributions and to different values of x- 
This shortcoming has been vaguely felt by statisticians. We shall explain how 
these differences arise and show that they are important enough to cast a great 
deal of doubt on the validity of the application of the usual % method to a con¬ 
tinuous variate. Finally, we propose a procedure which is free from these 
difficulties. 

1. The observed distributions. The method gives a numerical measure of 
the differences between the observed and the theoretical distribution. A theo¬ 
retical distribution is completely determined once the constants are known. 
For a discontinuous variate the observed distribution is also well defined; but 
for a continuous variate the concept “observed distribution” is vague. To 
classify N observations, Xi, xi, • • • x m , • • • x# arranged in increasing order, we 
introduce two arbitrary actions: the choice of the intervals and the beginning 
of the first cell As a rule, all cells have the same length, and they are bounded 
by integral numbers, or even numbers, or round numbers, 0, 5,10, of the variate. 
But these classifications and the preference given to round numbers for the start¬ 
ing point have no theoretical foundation. 

A certain guide for the systematic choice of the class length and the beginning 
of the first cell may be found by turning to the theory. Many theoretical dis¬ 
tributions of a continuous variate x have only two constants, and permit the 
introduction of a reduced variate y with the dimension zero, where 



The constant a is a mean, and h is a measure of dispersion. The probabilities 
W{x) (or F(y)) for values equal to or less than x (or y) are 

(2) W(x) = F(y). 

For most distributions, for which the above transformation is possible, tables 
for F(y) exist, in which the argument progresses by a fixed mterva y. y 
taking an initial value yo and a fixed interval A y, the differences 

(3) NF(y Q + iAy) - NF(y Q + (i - 1)A y) = Np, U = X > 2 > ‘ ‘ * k) 

may be interpreted as being the theoretical distribution. The corresponding 
values of the variate, by ( 1 ), are 

(4) x(i) = a + b(y Q + iAy); x(i — 1) = a + b(yo + “ l)^v) 

263 
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and the cell length is 

(5) A(x) = 6 Ay. 

In (3) k is the number of cells. In general, x(t) and x(i — 1) will not exist among 
the observed values x m . By arranging the observations m the cells given by 
the theoretical values (4), we obtain an observed distribution consisting of the 
contents a,' of the cell i. This procedure prescribes a classification of the observa¬ 
tions according to the theory. The intervals selected are multiples of some 
measure of dispersion. In principle, the choice of Ay and of the starting point 
y 0 remain arbitrary; in practice, the selection of Ay is limited by the intervals 
given in the probability tables. 

This natural classification may be used for constructing different observed 
distributions from the same set of observations. We determine the constants, 
then choose a small interval and a starting point which is below the smallest 
observation . The last cell is such that it contains the largest observation x# . 
In this way, we obtain the initial observed distribution, consisting of fc cells 
If we combine h cells (h = 2, 3, • • ■ |b), we obtain h different observed dis¬ 
tributions : We combine h — 1 void cells with the first cell of the initial distribu¬ 
tion, we combine the second cell and the following h — 1 cells of the initial dis¬ 
tribution, and so on. Generally, we combine q void cells (q — h — 1, h — 2, • • 

1 , 0) with the first h — q cells of the initial distribution, then the next h cells of 
the initial distribution, and so on The last of these h distributions starts with 
the first h cells of the initial distribution. 

If we combine more and more cells, the number of observed distributions, 
having the same intervals, increases. The larger the intervals the larger is the 
influence of the starting point, and the more the observed distributions become 
dissimilar. To see this influence of classification on the shape of the observed 
distributions, consider the extreme case for a symmetrical theoretical distribu¬ 
tion of an unlimited variate. Let the observed distribution consist of two cells 
Assume besides that the observed median is close to the theoretical one. If the 
cut between the cells is identical with the theoretical median, the two cells have 
the contents \N -f- e and — «, where e is small. If the cut is shifted suffi¬ 
ciently far to the left or right of the median, the cell contents will be 0, N and 
N, 0. These two distributions are completely different. 

To each observed distribution corresponds a theoretical one obtained from 
(3) by the same combination of cells as the observed distribution. In the graphi¬ 
cal representation, the same continuous theoretical distribution may be used for 
all observed distributions by choosing the scale of the ordinate properly. The 
length chosen for representing one observation in the initial distribution will 
represent h observations for the h distributions obtained by the combination 
of h cells. 

The different observed distributions corresponding to the same observations 
and to the same theory will give different values of 

a _ V' (Qi ~ NpiY 

X U Np t ' 


( 6 ) 
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The expected contents of the first and last cell are 

(7) Npi = NF(y„ + Ay), 

(8) Np k = N(1 - F(y 0 + .(k - 1)A y)). 


Since the total expected frequency must be equal to the number of observations 
(9) £ Np t - £ a,., 

»-l ' »-l 


formula (6) may be written 


( 10 ) 



N. 


This formula, being simpler than (6), will be used in the numerical example. 

An upper limit for x is furnished by the case that one cell j contains all ob¬ 
servations Then 


whence from (10) 


a, = N] a, = 0 for i j , 


(11) 0 ^ x 2 g - - AT. 

V, 

The upper limit depends again upon the intervals and the starting point of the 
classification. If the probability for an observation to be contained in the cell 
j is small, the upper limit is large. 

The exact distribution of x 2 has not yet been established. To obtain an ap¬ 
proximation, it is assumed that a binominal distribution may be replaced by a 
normal distribution. As this does not hold for cells with a small expected fre¬ 
quency, the contents of such cells must be combined. This prescription, which 
is also valid for a discontinuous variate, constitutes a third arbitrary action in 
the calculation of x- It invalidates the prior postulate that all cells ought to 
have the same length. 

The approximation used for the probability P of obtaining a value of x, equal 
to or larger than the observed one, is 

(12) P( x ^ ,r)=K^^ !|( ’-V , ’(fe , 

J x 3 

where v is the number of degrees of freedom. Since 


(13) 


£<o ; 

dX 2 


dP 

dp 


> o, 


P diminishes as x 2 increases, v being given, but P increases as v increases, x 
being given. By choosing larger cells, the number v diminishes, and P may 
remain the same if x diminishes adequately. 

It is easy to see that x 2 cannot increase as a result of the combination of cells 
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and will, in general, decrease. Let fli and ch represent the actual number of 
observations in two cells that are to be combined. Let Npi and Njh be the ex¬ 
pected numbers. Then, the contribution of the two separate cells to x minus 
the contribution of the two combined cells is, by (10) 

Oi ^ ai a? 2oi dj b Oi 

Np 1 Nj)i N(pi + Pi) 

As cii and a 2 are positive or zero, the difference is proportional to 

a\pl + alpl - 2ai(hpiPi = (a 1 p 2 - <hV\ f S 0. 

The equality holds only when cq: a 2 = p t :p 2 . Then, the combination of cells has 
no influence on x, but it reduces the number of degrees of freedom by one, and 
diminishes the probability P. In the general case, the combination of cells 
diminishes x and diminishes v at the same time According to (13), the first, 
influence tends to increase the probability P, the second to diminish it. It 
cannot be stated a priori which influence is stronger. 

For a given set of observations, a continuous variate and a given theory, which 
includes given estimates of the constants, the probability P depends upon three 
arbitrary actions. If a certain choice of the intervals gives a good fit, it cannot 
be concluded that a broader classification gives the same or a better fit [4]. For 
a given interval, P may vary considerably with the starting point. This influ¬ 
ence cannot be allowed for by any formula as the number of degrees of freedom 
does not depend upon the starting point, Finally, the term “small expected 
numbers” is vague. Different combinations of cells lead to different probabili¬ 
ties, It is generally assumed that these influences remain within leasonable 
limits and that P does not vary considerably if we change the class length or the 
starting point. In the following example, we shall show that this opinion is 
erroneous. 

2. Numerical example. The flood discharge of the Mississippi River at Vicks¬ 
burg for each of the fifty years 1890-1939 will be used to illustrate the extent to 
which the observed distributions and P vary with the choice of cell length and 
the starting point. The observed flood discharges x m measured in 1,000 cubic 
feet per second are given in Table VI of a previous article [2], and are not re¬ 
peated here. The expected distribution is given by the theory of largest values 
which states that the probability $B(x) of a flood discharge equal to or less than 
x is 

(14) SB(a) = e-" B( *"“ ) 

Values of 2B(x) as a function of the reduced variate 

(15) y = a {x - u), 
are given in Table II of the reference first cited. 
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Calculation of the constants a and u leads to the theoretical value of the flood 
discharge 

(16) x = 1201.9 + 266.li/ 

associated with a given probability F{y) = SB (a;). 

TABLE I 


Observed and theoretical distribution (1) for the.interval Ay = .SB; Ax = 66.525 


Variates 

Distributions 

Reduced 

Absolute 

Observed 

Theoretical 

y 

X 

a. 

Np, 

1 

2 

3 

4 


736.2 

1 

.5655 

g-1.50 

802.8 

1 

.959 

-1.25 

869.3 

3 

1.775 

- 1.00 

935.8 

3 

2.720 

-.75 

1002.3 

5 

3.5955 

-.50 

1068.9 

1 

4.2315 

-.25 

1135.4 

3 

4.5475 

.00 

1201.9 

3 

4.554 

.25 

1268.4 

3 

4.314 

.50 

1334.9 

6 

3-914 

.75 

1401.5 

6 

3.434 

1.00 

1468.0 

4 

2.934 

1.25 

1534.6 

2 

2.4565 

1.50 

1601.1 

0 

2.0235 

1.75 

1667.6 

2 

1.647 

2.00 

1734.1 

0 

1.3270 

2.25 

1800.6 

2 

1.0615 

2.50 

1867.2 

2 

.844 

2.75 

1933.7 

0 


3.00 

2000.2 

2 

.527 

3.25 

2066.7 


.414 

3.50 

2133.3 



3.75 

2199.8 


.255 

4.00 

2266.3 


.1995 

^4.25 

2332.8 

1 

.708 


50 

50.000 


The first observed distribution presented in Table I is obtained by letting 
Ay = .25; Ax = 66.525 and j/ 0 = —1.75. The expected number of observations 
for the first and last cell are 50F(—1.5) and 50 (1 — F (4.25)) respectively. 
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The expected frequencies (formula 4) for the other cells 
np, = 50 [F(y + .25) - F(y)], 

were obtained by successive substraction of two consecutive figures given in 
column 2, Table II [2]. The theoretical and the observed distribution are 
plotted in figure 1. The observed distribution given in Table I is very irregular. 

Evidently, the intervals are too small. Therefore, we construct the observed 
and theoretical distributions (2) and (3) for cells which are two times larger. 



The first cell in distribution (2) is obtained from distribution (1) by combining 
the first cell of (1) with the empty one before it; the second cell is obtained by 
combining the second and third cells of (1); and so on. 

Distribution No. 3 is obtained by combining the first two cells of distribution 
No. 1, then the third and fourth, and so on. The observed distributions 2 and 3 
and the theoretical distribution are plotted in figure 2. The scale of the ordinate 
is $ of the scale in figure 1. In the same way, the three observed distributions 
(4), (5), (6) for the interval Ay = J, Ax = 199.57 are obtained by combining 
either two void cells with the first cell of Table I, or one void cell with the first 
and second cell of Table III, or the first three cells of Table I (see fig. 3). 

Finally, the four observed distributions (7), (8), (9), (10) for the interval 
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Ay = 1; Ax = 266.1 are compared with the theoretical distribution in figures 
4 and 5. The four distributions 7-10 differ considerably. Distributions 8 and 
9 indicate that the agreement between theory and observations is good, dis¬ 
tribution 7 and 10 indicate that the fit is bad. The x method must give the 
same contradictory results. 


TABLE II 


Four values of P(x 2 ) for the same observations and the same theory 


1 

a 


EH 

0 

6 

7 

8 

9 

10 

Mid¬ 

points 

ObB 

t 

( 7)1 

erved 

ration 

(10) 

Dial 

Bj flj 

(9) 

jTI- 

(8) 

Theoret¬ 
ical Dis¬ 
tributions, 
Np, 

Components of x J + N 


■ 



i 

3.2995 



7.577 




8 









I 



■ 


17.577 




1002 

I 



14 

13.8465 




14.155 

1069 



12 


15.0945 





1135 

■ 

12 



16.9285 





1202 





17.6470 

5.667 




1268 




15 

17.3295 




12.984 

1335 



18 


16.2160 





1401 


19 



14.5960 


24.733 



1468 

18 




12.7385 

25.435 




1534 




12 

10.8480 




13.274 

1601 



8 







1667 


4 





2.146 



1734 

4 





2.641 




1800 









7.378 

1867 



7 





7.742 


1933 


7 





9.796 



2000 

5 





6.344 




2066 










N . 

50 

50 

50 

50 

200.0000 

x 2 + N = 57.664 

55.813 

51.902 

50.698 

v. 

2 

2 

2 

2 

P . 

.023 

.057 

.399 

.705 


The details for the calculations of x 2 are given in Table II. The numbers of 
column 1 are the midpoints of the cells. To save space, the four theoretical dis¬ 
tributions obtained from Table I, col. 4 are written in the same column (6) 
directly opposite the corresponding observed distributions given in columns 
2 to 5. Through formula (10) we calculate the components of x 2 + N (cols. 7 
to 10). Although the four distributions differ only with respect to the beginning 
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of the first cell, the value of P for the observed distribution number (8) is more 
than thirty times the value of P for the observed distribution number (7). In 
view of the fact that these values of P are calculated for a fixed set of observa¬ 
tions, for the same theory, the same constants, and the same number of degrees 
of freedom, the differences found are surprising. 

3. The probability integral transformation. This example shows that the 
probability P may vary with the starting point in such a way that no conclusion 
about the acceptance or rejection of a hypothesis can be obtained from the usual 
X 2 method. The three arbitrary steps described above may be avoided if we 
choose cells of equal probability instead of cells of equal length. The required 
intervals are obtained from the probability integral transformation, due to Karl 
Pearson [6]. Let w(x) be a distribution of a continuous variate x, let y = W(x) 
be the transformed variate, then the distribution p(y ) of the variate y is 

(17) v(v) = I- 

In other words: The probabilities W (x) are uniformly distributed. If a distribu¬ 
tion w(x) has been chosen for a given set of observations x m , we can control this 
theory by investigating whether the "observations” W{x m ), i e , the theoretical 
cumulative frequencies of the observed values are uniformly distributed. Thus, 
the comparison of the observed distributions with any continuous theoretical 
distribution is reduced to the comparison of an “observed” with a theoretical 
uniform distribution. To a given set of observations and a given theory there 
is one, and only one, "observed" distribution. If we introduce within w(x) 
another set of constants, or choose instead of w(x) another theory <p(x), we ob¬ 
tain, of course, other "observed” values [1], 

The goodness.of fit between this theory and these “observations” may be 
measured by the x method. We divide the interval zero to N, which contains 
the N “observed” numbers NW(x m ) into fc cells of equal length, and enumerate 
the “observed” points NW(x m ) contained in each cell. The starting point of the 
classification is always zero. The expected number of observations for each cell 
is always N/k. If we choose k sufficiently small, the necessity for combining 
cells is eliminated. We have to choose k in such a way that the conditions, 
under which formula (12) holds, are fulfilled. The question of the best choice 
for the number of cells has been studied by Wald and Mann [3]. Their solution 
is valid for small levels of significance and for large numbers of observations. 

4. Conclusion. The usual x 2 test is unreliable for a continuous variate as it 
involves three arbitrary decisions, From the same observations, the same 
theory, and the same constants different statisticians, equally well trained and 
equally careful, may obtain different probabilities P, and may proclaim any one 
of these results as final. Therefore, the usual x 2 method does not lead to a de¬ 
cision whether a hypothesis has to be rejected or not. Such a decision is possible 
if we UBe the probability integral transformation. Unfortunately, the question 
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of the best choice of the cells for small numbeis of observations and large levels 

of significance is not yet solved. 
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A SAMPLING INSPECTION PLAN FOR CONTINUOUS PRODUCTION 1 2 3 

By II. F. Dodge 

Bell Telephone Laboratories, New York 
I. Introduction 

1. Purpose. This paper presents a plan of sampling inspection for a product 
consisting of individual units (parts, subassemblies, finished articles, etc.) manu¬ 
factured in quantity by an essentially continuous process. 

The plan, applicable only to characteristics subject to nondestructive inspec¬ 
tion on a Go-NoGo basis, is intended primarily for use in process inspection of 
parts or final inspection of finished articles within a manufacturing plant, where 
it is desired to have assurance that the percentage of defective units in accepted 
product will be held down to some prescribed low figure. It differs from others 
which have been published 2,3 in that it presumes a continuous flow of consecutive 
articles or consecutive lots of articles offered to the inspector for acceptance in the 
order of their production. It is accordingly of particular interest for products 
manufactured by conveyor or other straight line continuous processes. 

In operation, the plan provides a corrective inspection, serving as a partial 
screen for defective 4 * units. Normally, a chosen percentage or fraction f of the 
units are inspected, but when a defective unit is disclosed by the inspection it is 
required that an additional number of units be inspected, the additional number 
depending on how many more defective units are found. The result of such in¬ 
spections is to remove some of the defective units, and the poorer the quality 
submitted to the inspector, as measured in terms of per cent defective, the greater 
will be the corrective or screening effect. The object of the plan is the same as 
that incorporated in some of the sampling tables already published 6 , namely, 
to establish a limiting value of "average outgoing quality" expressed in per cent 


1 Presented at the Joint Meeting of the American Society of Mechanical Engineers and 
the Institute of Mathematical Statistics, May 29, 1943, by H. F. Dodge, Quality Results 
Engineer, Bell Telephone Laboratories, New York 

2 H F. Dodge and H. G. Romig, "Single Sampling and Double Sampling Inspection 
Tables”, Bell Sya Tech Jour , Vol XX (1941) pp . 1-61, An unpublished paper by Prof 
Walter Bartky (developed when he was associated with the Western Electric Co,, 1927) 
provides a continuous multiple sampling plan involving two factors—/, as used here, and i, 
the number of units in a "compensatingsample” required to be inspected for each defective 
unit found. 

3 Lt. R, J Saunders, “Standardized Inspection", Army Ordnance, Vol XXIV (1943) pp 
290-292; G Rupert Cause, “Quality Through Inspection”, Army Ordnance, Vol. XXIV 
(1943) pp 117-120 

* A unit of product that fails to meet the requirement for a characteristic is classed as 
nonconforming with respect to that characteristic, and for convenience is referred to as 
“defective” Thus, a deviation from a specified- requirement or from accepted standards 

of good workmanship is termed a “defect” 

s H. F. Dodge and H. G Romig, loc cit. 
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defective which will not be exceeded no matter what quality is submitted to the 
inspector, This limiting value of per cent defective is termed the “average 
outgoing quality limit (AOQL)’’. 

The theoretical solution treats the case of inspecting a continuous flow of 
individual units and is based on the distribution of random-order spacing of 
defective units in product whose quality is statistically controlled. 6 Part III of 
the paper extends the application of the method to a continuous flow of individual 
lots or sub-lots of articles. 

II. Inspection of a Flow of Individual Units 

2. Inspection of one characteristic. Consider first the inspection of a flow of 
individual units, offered consecutively in the order of their production. As¬ 
sume that inspection is to be made for only one quality characteristic, so that 
interest will be centered on one kind of defect. Subsequently (Section 13), 
consideration will be given to the procedures when inspection is made simul¬ 
taneously for several kinds of defects. 

3. Procedure A. The procedure is as follows: 

(a) At the outset, inspect 100% of the units consecutively as produced and 
continue such inspection until i units in succession are found clear of 
defects. 

(b) When i units in succession are found clear of defects, discontinue 100% 
inspection, and inspect only a fraction/of the units, selecting individual 
sample units one at a time from the flow of product, in such a manner as 
to assure an unbiased sample. 

(c) If a sample unit is found defective, revert immediately to a 100% inspec¬ 
tion of succeeding units and continue until again i units in succession are 
found clear of defects, as in paragraph (a). 

(d) Correct or replace with good units, all defective units found. 

4. Protection provided by the plan. The inspection plan is defined by the 
two constants, / and i , which can be altered at will. For given values of /, i, and 
p (incoming fraction defective), there will result for product of statistically con¬ 
trolled quality a definite average outgoing fraction defective (average outgoing 
quality, AOQ). For given values of / and i, the AOQ will have a maximum for 
some particular fraction defective pi of incoming quality. As noted above, this 
maximum is referred to as the average outgoing quality limit (AOQL). For all 
other values of incoming fraction defective p greater or less than pi, the AOQ 
will be less than AOQL Many combinations of / and i will result in the same 
AOQL 

The protection offered by the plan discussed here can thus be expressed in 
terms of the AOQL, in per cent defective. 

4 "Statistical control” as defined in the literature, see W A Shew hart, Statistical Method 
from the Viewpoint of Quality Control, The Graduate School, U. S. Dept, of Agriculture, 1839. 
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6. Theoretical framework. We are concerned with the spacing between 
defective units when the individual units are arrayed in the order of their pro¬ 
duction, as shown in Fig. 1. If the manufacturing process is statistically con¬ 
trolled so that the probability of producing a defective unit is constant and equal 
to p, then defective units will have an order spacing of a random character which 
is expressible in terms of certain probability laws. Product turned out by such 
a process will be referred to as having a process average fraction defective p. 
The “event’' of particular interest is a “terminal-defect sequence” of » + 1 suc¬ 
cessive units following the observance of a defect, comprising a succession of i 
nondefective units followed by a defective unit, as shown in Fig. 1. The totality 
of all possible such sequences, w r here i varies from 0 to «, constitutes the uni¬ 
verse of events under consideration. 

Each such sequence of i + 1 units, comprising i successive nondefective units 
followed by a defective one, has a definite probability of occurrence, for a process 
average fraction defective, p. The complete set of such probabilities for all 
possible sequences, having respectively i = 0,1,2,3, • • • °°, defines a probability 
distribution 7 of random-order spacing of defects in uniform product. This is 


-defective unit 

TERMINAL DEFECT 
SEQUENCE 

j-NONDEFECTIVE UNIT 
XQOOOXOOOOOOOOOX 

oooooooox 


orocr or PRODUCTION 
Fto. 1. Spacing of defective units 

shown in the table below in which 0 represents a nondefective unit, X represents 
a defective one, p is the fraction defective, and q = 1 — p. 
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i 
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7 Eomanovsky, V , "Due Nuovi Criten di Controllo Sull ‘andamento Casuale di Una 
Succession di Valon”, Gxornale dell 'Institulo Itahano degh Aituan (1932) discusses this 
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These probabilities are the successive terms in the infinite power series 

V + PS + V<? + Pq* + • ■ •' 


(1) or, p(l + q + q 2 + g 3 + 


The sum of this series is 



1, i.e., the total probability for all possible 


sequences is unity (as it should be). 

The sum'of the first i + 1 terms of the series is the probability of occurrence 
of a “terminal-defect sequence” (defect spacing) of i + 1 units or less. The sum 
of the first i terms is the probability, Pi, of failing to find the next i units clear 
of defects, which is 


( 2 ) 


Pi - E v<i = i - 


i-o 


In turn, the sum of all terms beyond the ith term is the probability of finding 0 
defects in the next i units, which is 


(3) Qi = 1 — Pi = q\ 

These results and the power series (1) enter into subsequent portions of the 
discussion. The curves of Fig. 2 give values'of 1 — q\ 

6. Average outgoing quality. Suppose a plan is selected, choosing specific 
values of f and i. 

For given values of i and p, there will be an expected average number of 
units, u, inspected following the finding of a defect. Likewise, for given values 
of / and p there will be an expected average number of units, i>, that will be 
passed under the sampling procedure before a defect is found. The latter 
average number includes the sampling units actually inspected as well as the 
uninspected units produced between successive sample units. 

The average fraction of the total product units inspected in the long run is 


(4) 


p = u + fv 
U + V ' 


It is now assumed for purposes of solution that the inspection operation itself 
never overlooks a defect and that all defective units found during the inspection 
of f and i will be corrected or replaced by good units. 8 


probability distribution of spacing of events, referring to the spacing aB the “length of a 
partial series” Our term “terminal-defect sequence” has the same significance as his term 
“partial series”. See also P, S Olmstead, "Note on theoretical and observed distributions 
of repetitive occurrences”, Annals of Math Stat Vol XI (1940) pp 363-366; A M. Mood, 
“The distribution theory of runs”, Annals of Math. Stat , Vol XI (1940) pp. 367-392 
8 The assumption that the inspection operation is perfect cannot be made without reser¬ 
vation Machine inspection devices have their margins of error Also, inspection fatigue 
prevents 100% manual and visual inspections from insuring perfection, particularly if such 
inspections continue over a considerable period of time But the efficiency of the latter 
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As a result of the screening effect of the inspection, the average outgoing 
quality, AOQ, designated p A , is related as follows to the incoming quality p: 

(5) * - pfl - f) - „(l 

7. Determination of u. The average number of units, u, inspected on a 100% 
inspection basis following the finding of a defect is a function of i and p, and 
may be determined from a considration of two power series, one limited and 
the other infinite. 

Once the 100% inspection starts, there are several things that can happen 
before i units are found clear of defects The first i may be found clear; or 1, 
2, 3, or more defects may be found before finally a run of i units is found clear. 

One of the quantities to be determined is the average number of units inspected 
in a “failure sequence,” that is, one terminating in a defect and comprising i 
or less units. This average number, designated as h, is the average of the 
distribution made up of the first i terms of the power seiies (1). The average is 

(6) h = r-~-, (1 + 2g + 3g 2 + 4g 3 4- • • ■ fi- iq *), 

J. y 

where the denominator is the sum of the probabilities for the first i terms. This 
may be evaluated as follows: 

h= r ^- qi ~(l+q + q i + q 3 + ••• + q) 

_ _P_ d f 1 ~ g ,+1 l 

1 ~ q' dq L 1 - 3 J 

< 7 > 

Note that if pi is small compared with unity, h is approximately 1/p. 

The next step is to determine the average number of failure sequences that will 
be encountered before finding i units clear of defects. This average number, 
designated as G , may be found from the probability distribution of all possible 
numbers of failure sequences, expressed by the infinite series 

(8) Qi(l + Pi + Pi + Pi + • • •) 

where Pi is given by equation (2), Qi = 1 — Pi , as given by equation (3), and 
the successive terms are the probabilities of occurrence of 0, 1, 2, 3, etc. failure 


inspections is generally higher when an interest incentive is provided as is usually the case 
in sampling inspection plans where the extent of such inspections hinges on their findings. 

The solution given assumes correction or replacement of defective units. Where it is 
expedient to reject such unitB and not replace them, equations (19) to (22) inclusive, should 
be modified by replacing i by i — 1. 
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Fig 2 Curves defining distribution of random order spacing of defects m uniform product 
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sequences before finding i units clear of defects. The average number of failure 
sequences, G, is given by the sum of the infinite series 

G = Qi(0 + IPx + 2 Pi +3 Pl+ ■■■) 

(9) = QiPi(l + 2Pi + 3 Pi + 4Pl +•••)- 

Summing the series, we have 


( 10 ) 


G = 


QiPi 


1 

(1 - Pi) 2 


Pi 

Qi 



Now u, the average number of pieces inspected following the finding of a 
defect, is made up of a number of failure sequences followed by a run of i units 
clear of defects. Using the average values of G and h just found, we have 


( 11 ) 


u — Gh + i = --- 

pq' 


8. Determination of v. The average number of units, v, that will be passed 
in a period of sampling inspection will be 1// times the average number of in¬ 
dividual sample units inspected in such periods. Here again the solution will 
depend on the random order spacing of defects in uniform product. Whether the 
individual units selected during the sampling inspection procedure are selected 
by a random spacing device, or by any other means which will prevent known 
bias in the sample, we may assume that defects will be found to occur m ac¬ 
cordance with the distribution of random order spacing defined by the terms of 
the series given in (1). The average number of sample units inspected in a 
period of sampling inspection will thus be the average defect spacing for product 
having fraction defective, p, which is given by the infinite series. 

(12) H = p(l + 2q + 3 q 2 + 4 q a + ■ • •). 

Summing the series, we have 


(13) 


H = 


V 

(1 - q)* 


and the value of v is found to be 


1 

v' 


(14) 



1_ 

fP~ 


9. Determination of / and i for a given value of AOQL. From the considera¬ 
tions given above, the average fraction of the product inspected, F, and the 
value of average outgoing quality, p A , can be determined for any given values 
of p, f, and i. Substituting in (5), the values of u and a given in (11) and (14), 
we have 

PA = P [ X “ /+ (1 -/)(1 - P)']' 


(15) 
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The average outgoing quality limit, AOQL, (p L ) is the maximum value of p A 
that will result for any given values of / and i, considering all possible values of 
p in the submitted product. The value of p for which this maximum value of 
p A occurs is designated by pi, hence 

<16) 

The value of pi for which p A = p L is determined by differentiating (15) with 
respect to p, equating to 0, and solving for p, that is 

n7 s d?A = i _ f + /a - m - py + pm - m - P y- 1 

1 ; dp If + (1 -7X1 - p)f 

Simplifying, and using the designation pi for the maximizing value of p, gives 
(* + 1 )Pi - 1 = (1 - Pi)’ + \ or 


(18) 

n _v _ /K* + l)Pi ~ 1] 

<l Vl) a 

Substituting in 

(16) this value of (1 — pi) 1 , we have 

(19) 

(t -(- 1 )pi — 1 

p L = -- -r- -, hence 

z 

(20) 

1 + ipL 

Vl i + 1 ’ 

From (18) and (19), we have 

(21) 

Vl - (1 - Pi) ,+ \ hence 

(22) 

, _ (1 - P.) i+1 

J ipL + (1 - Pi) ,+1 ‘ 


The curves given in Fig. 3 were calculated by choosing values of i for given 
values of AOQL {p L ) and calculating pi from equation (20) and / from equation 
(22). Thus for a given AOQL value, an i value may be found for a chosen / 
value and vice versa. It will be noted that for a given value of /, i varies in¬ 
versely with the AOQL value, to a close degree of approximation. 

10. Operating characteristics of the plan. Figs. 4(a) and 4(b) give a picture 
of the operating characteristics of the general plan as / and i are varied. They 
indicate for example that for a moderate range of f values the factor i has a 
stronger influence than / in determining the discrimination that the method 
affords between high and low levels of incoming per cent defective. For the 
values of / and i shown, Fig. 4(b) indicates just what level of incoming per cent 
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defective would force a correction of the manufacturing process, if the percentage 
of total production that would be accepted on a sampling basis falls below a 
critical value—often, a value of the order of 80% to 90%. 

Fig. 5 gives a comparison of the characteristics of several plans having the 
same AOQL value, 1%. It indiates for example that when the normal level 
of incoming per cent defective is well below the AOQL, the AOQL value can be 
assured with less inspection by choosing / small and i large But since, for a 
given AOQL value, the average amount of inspection approaches a minimum 
as /approaches 0, factors other than the minimum amount of inspection have a 


Co) PER CENT OF PRODUCT UNITS 
ACCEPTED WITHOUT INSPECTION 


Os) PER CENT OF TOTAL PRODUCTION 

accepted on a sampling basis 



PER CENT OF PRODUCT UNITS 
ACCEPTED WITHOUT INSPECTION 



INCOMING PER CENT DEFECTIVE 

Pig. 6 Characteristics of tlnec plans having the same AOQL of one per cent 

more important influence on the choice of the most advantageous combination 
of / and i value? for a given set of circumstances. For example, when e 
inspector is located at the end of the production line, it may be desirable to use 
a value of i not greater than some small multiple of the number of product urn s 
on the line at any one time. Or again, the value of f is often influenced y ® 
normal work loads of the inspector and the operators on the line. F ro ec 1 . 
against “spotty” quality, such as may arise from temporary irregularities ^n 
workmanship or materials, should receive special consideration in co 
with the choice of /. 
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11. Protection against spotty quality. The p t scale at the right, of Pig 3 pro¬ 
vides a guide concerning the protection afforded against spotty quality in a 
continuous run of product. The value of p t is the per cent defective m a run of 
1000 consecutive product units, for which the probability of acceptance by sam¬ 
ple is 0.10 for a percentage sample equal to the corresponding/ value shown on 
the chart. 

This scale indicates that the protection against spotty quality falls off very 
rapidly with / and that the protection, considering runs of product of 1000 
consecutive units each, becomes quite poor if / is less than 2%. 

12 Effect of selecting group samples rather than one unit at a time. The 

above development assumes selection of individual sample units one at a time 
from the flow of product and immediate examination of a unit to determine 
whether or not it is defective. Deviations from this procedure will in general 
result in giving values of AOQL higher than those shown in Fig. 3. 

For example, the actual AOQL may be higher than the theoretical value (a) 
if the inspector delays looking at the individual units immediately when they 
are withdrawn from the line, or (b) if he selects a group of units at one time 
from the production line. The effect of either of these two deviations, both 
constituting a delay, may be quite large if i is small, or if large group samples 
are taken. 

Although the modification of the theoretical AOQL value resulting from the 
selection of group samples has not been thoroughly explored, this should not be 
excessive, 

(a) if group samples of n = 10 or less are drawn from the line, and 

(b) if i = 50 or more, 

provided there is no delay in examining the group samples drawn from the line. 

It should be noted however, that the effect of these delay factors on the AOQL 
may be compensated for in part if, when a defect is found, the 100% inspection 
includes some of the units that have already passed the inspection point. 

Where appreciable delays are unavoidable, an alternative is to withhold from 
acceptance a stipulated number of units pending the examination of the sample 
units that have been selected to represent this quantity of product. Such a 
procedure provides in effect a lot acceptance plan, the treatment for which is 
covered in Part III. 

13. Administration of inspection operations. The inspection plan is most 
effective in practice if it is administered in such a way as to provide an incentive 
to clear up causes of trouble promptly. Such an incentive may be had by im¬ 
posing a penalty on the operating or manufacturing department when defects 
are encountered. Normally, no such penalty is imposed if both the sampling 
inspection and the 100% inspection are performed by the same person or group 
of persons and the two costs merged; the inspector then merely serves as an 
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agency for screening defects when quality goes bad It is accordingly recom¬ 
mended that the sampling inspection and the 100% inspection operations be 
treated aS two separate functions 

With this in mind, the inspection work can be performed by two different 
inspectors, designated inspector C and inspector M. Inspector C may be 
considered as the consumer's representative m that his work is performed as a 
function independent of the manufacturing group. The term “consumer” 
is used in the general sense of the recipient of the product after the inspection 
has been completed. Inspector M is responsible to the Manufacturing Depart¬ 
ment and the cost of his work is borne by that Department. His work must 
however be subject to the surveillance and approval of inspector C. 

The following method of administering the inspection plan can then be used: 

(a) Inspector C inspects the required fraction /. So long as no defects are 
found, product is considered acceptable and is passed. 

(b) When inspector C finds a defect, he 

1. continues inspecting the fraction /, 

2. places some identification on the succeeding flow of product to indicate 
nonacceptance (or diverts it from the regular production line if the 
design of the line permits), such designation to apply until clearance 
is obtained in accordance with paragraph (c), and 

3 calls inspector M to inspect the succeeding flow of product in accord¬ 
ance with paragraph (c) 

(c) Inspector M (one or more inspectors as needed) inspects all succeeding 
units, except those inspected by inspector C in the fraction /, until the 
required number of units, i, are found clear of defects. Inspector M 
reports immediately to Inspector C all defects found in the course of his 
100% inspection and notifies him when a run of i units has been found 
clear of defects, 

(d) When notified that a run of i units has been found clear of defects, in¬ 
spector C, if satisfied with the work of inspector M, releases inspector M. 

(e) To facilitate speedy correction of causes of trouble, inspector C, on finding 
a defect, should promptly notify the production foreman or other desig¬ 
nated authority and furnish the latter with detailed information regarding 
the character of the defect found. 

It will be noted that the above procedure requires calling inspector M whenever 
inspector C finds a defect. To avoid taking such action on the occurrence of 
a single defect, the procedure can be modified so that inspector M is called into 
the picture only when two defects in succession are observed by inspector C. 
Where this feature is desired, paragraph (b) above may be modified to read 
as follows: 

(b) When inspector C finds a defect, he 

1. proceeds immediately to inspect all succeeding units up to a total of 
i units, and if no defects are found therein, he again limits his inspection 



27G 


H. F. DODGE 


to the fraction/. If, on the other hand, during the course of inspecting 
the next i units, inspector C finds a second defect, lie immediately 
discontinues his 100% inspection, 

2. places some identification on the succeeding flow of product . . etc. 
While this procedure carries the disadvantage of placing a varying work load 
on inspector C, it is often preferred since a single defect tends to be regarded as 
an isolated occurrence ivhcieas two defects in quick succession, (like a first and 
second offense) are normally accepted as sufficient evidence to justify special 
action. 

14. Inspection for several kinds of defects simultaneously. The procedure 
given above may be applied directly to an inspection covering two or more kinds 
of defects, provided that the chosen'AOQL value applies to all defects collectively 
and each unit inspected is always inspected for all of the defects under considera¬ 
tion. 

It is sometimes desired, however, when a defect of one kind is observed, to 
confine the 100% inspection to this one kind of defect alone. This requires a 
modification of the general proceduie and the establishment of a separate AOQL 
for each kind of defect. A similar modification is required for example where 
the inspection is to cover seveial kinds of defects, but where the defects are 
grouped into two or more classes, according to their seriousness, and the defects 
m each class treated collectively. 

The following paiagvaphs outline for illustrative purposes a procedure for 
use where the defects under consideration are to be classified into two groups, 
Major and Minor, and where all Major defects are to be treated collectively 
and all Minor defects likewise By analogy, the procedure to be followed when 
each kind of defect is to be treated separately will be obvious. In any event, 
the fraction/is made the same for all classes or all kinds of defects. 

Procedure 

Several kinds of defects are grouped into two classes with respect to serious¬ 
ness; designated Major and Minor. 

All defects of the same class (Major or Minor) are treated collectively. 
Preliminary 

(1) Establish an overall AOQL value for Major defects and an overall AOQL 
value for Minor defects. Select a suitable value for /, applicable to both 
Major and Minor defects From Fig. 3 determine a value of i for Major 
defects, designated , and a value of i for Minor defects, designated i B . 

(2) At the outset, inspect 100% of the units consecutively for both Major 
and Minor defects until f Max units m succession are found clear of defects 
(AtM = Lt or i s , whichever is the larger). 

Routine 

(3) When f Mns units in succession are found clear of defects, discontinue 100% 
inspection and mspect only a fraction / of the units for both Major and 
Minor defects, selecting individual sample units one at a time from the 
flow of product. 
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(4) If a, Major (or Minor) defect is observed during sampling inspection, 
inspect 100% of the succeeding units only for defects of the class in question 
until i A (or i B ) units m succession are found clear of defects of this class. 

(4.1) During the 100% inspection referred to in (4) inspect a portion / 
for both Major and Minor defects. 

(4.2) If during the 100% inspection for a particular class of defect (Major 
or Minor), a defect of the other class is observed on an individual 
unit of product, start 100% inspection for defects of the new class 
only if the new defect is observed on one of the / units that has been 
inspected for both Major and Minor defects, and continue such 
100% inspection foi defects of the new class until i (as determined 
in (1) for the new class) units in succession are found clear of defects 
of the new class. Do not take such action, however, if the new 
defect happens to be observed on one of the non-/ units. 

(5) When the proper number of successive units are found clear of defects 
as in paragraph (4) or (4.2), reinstate sampling inspection as in para¬ 
graph (3). 

Fiom the above it may be appreciated that difficulties of administration are 
introduced in treating a large number of classes of defects or a large number of 
individual defects separately. How best to group defects together for collective 
treatment can generally be determined from the nature of the inspection opera¬ 
tions, whether visual or gauging, and the expectancy of defects as determined 
from the quality history Items involving visual inspection, can often be treated 
collectively to advantage. 

As is generally true, the layout of an inspection plan depends to a considerable 
extent on the nature of inspection operations to be perfoimed. Simplicity of 
administration is always to be desired. From the standpoint of minimizing 
oveiall inspection costs, it is often preferable, where several quality character¬ 
istics are to be inspected, to break down the inspection work into two or more 
separate inspection steps, each covering a relatively small number of char¬ 
acteristics. 

III. Inspection of a Flow of Individual Lots or Sub-lots 

16. Purposes of Inspection. A manufacturer’s inspection of his own product 
serves two purposes 9 . 

(a) Process Control —To provide a basis for action with regard to the pro¬ 

duction process with a view to better future product. 

(b) Product Acceptance —To provide a basis for action with regard to the 

product already at hand 

The plan outlined in Part II has both of these purposes in mind, but the provi¬ 
sion for selecting sample units continuously from the production line places 
special emphasis on control It aids, for example, in the prompt detection of 
defects and location of causes of trouble in the manufacturing piocess. 

’ See A S. A War Standard, Z1, 3, Control Chart Method of Controlling Quality During 
Production, pp 5-6, 1942, American Standards Association, New York 
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The problem of acceptance of product is often eased, though at some sacrifice 
to the control aspects of the inspection work, if product is submitted to the 
inspector in lots or sub-lots and a sample taken from each. 

16. Inspection procedure for sub-lots. With minor modifications, the plan 
and procedure of Part II can be extended to the case where material is offered 
as a flow of consecutive sub-lots of articles. In the inspection of parts, for example, 
the material may be offered in pan-loads or trays, each containing a collection 
of parts produced under essentially the same conditions. Or again, the product 
from a common source for a given short period of time, such as a half-hour, 
one hour, etc., may often be treated as a sub-lot and offered to the inspector as 
such for his acceptance. In what follows, however, it is essential that such 
sub-lots be kept in the order of their production. 

The theoretical development given in Part II makes use of random-order 
spacing of defects in a statistically controlled product, with the specific provision 
that the units inspected be selected in the order of their production. In applying 
the general plan to the inspection of a flow of consecutive sub-lots, we no longer 
have individual units available in the order of their production. But we can 
use the same theoretical framework if we consider the random spacing of defects 
as their spacing in the chain of inspected units arranged in the order of their 
inspection. The probability distribution of the spacing of defects in inspected 
units will be the same regardless of the manner of selecting the units to be 
inspected, so long as we hold to the concept of statistical control in our solution. 

The “i units in succession to be found clear of defects,” discussed in Part II 
will now be defined as i consecutively inspected units. During sampling inspec¬ 
tion, a group sample of units will be selected from .each sub-lot, and the fraction 
/ will relate to the ratio of the number of units in the sample to the total number 
of units in the sub-lot The fraction / will be held constant for all sub-lots. 
Furthermore, when it is required under the general plan to find i inspected units 
in succession clear of defects, the 100% inspection must be allowed to extend 
into immediately succeeding sub-lots if i units in succession are not found clear 
in the current sub-lot. 

17. Procedure B. The procedure is as follows: 

(a) At the outset, start inspecting 100% of the units in a sub-lot and continue 
such inspection until i inspected units in succession are found clear of 
defects. Extend the 100% inspection, if necessary, into one or more 
succeeding sub-lots in the order of their production. 

(b) When i inspected units in succession are found clear of defects, discontinue 
100% inspection and inspect only a fraction / of the units from each of 
the sub-lots, selecting the sample units in such a way as to fairly represent 
the sub-lot. 

(c) If a sample unit is found defective, start a 100% inspection of the re¬ 
mainder of the sub-lot, and continue the 100% inspection until again i 
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inspected units in succession are found clear of defects, as in paragraph 
(a), extending such inspection into succeeding sub-lots, if necessary. 

(d) In the event the 100% inspection extends into one or more succeeding 
sub-lots, if the number of units inspected in the last of such succeeding 
sub-lots exceeds a fraction / of the number of units in the sub-lot, accept 
this last sub-lot without further inspection. If on the other hand, the 
number of units inspected in this last sub-lot is less than the fraction /, 
inspect additional units from this same sub-lot to make up a sample equal 
to a fraction / of the number of units in the sub-lot. 

(e) Correct or replace with good units all defective units found. 

As was the case in Part II, the inspection plan is defined by two constants, 
/ and i, and the protection offered is expressed in terms of AOQL. This sub-lot 
inspection plan differs from those already published in that the screening action 
is not confined to a single sub-lot but may extend over a succession of sub-lots, 
the entire production being regarded as a train of sub-lots that are linked together 
for purposes of inspection in the order of their production. 

IV Remarks 

It will have been noted that the plan here outlined should be regarded as a 
“special purpose” plan applicable under the conditions which have been enu¬ 
merated—where production is practically continuous, where inspection is to be 
made during production or immediately thereafter and is to serve not only as 
a screening acceptance agency if necessary, but as an aid to proem control by 
disclosing promptly any sub-standard quality conditions in the product. It 
is believed that the general plan provides a structure, which with possible var¬ 
iations in procedure to serve particular circumstances, may be found useful in 
designing additional sampling inspection techniques. 



ON THE THEORY OF RUNS WITH SOME APPLICATIONS TO 
QUALITY CONTROL 1 

By J. Wolfoavitz 
Columbia University 

1. Recent developments in the theory of runs. The increasing number and 
importance of recent advances in the theory and statistical applications of runs 
may make a brief paper on the subject of some interest The large volume of 
material and its wide dispersal, together with the limitations of space, will of 
necessity make these lemarks far from exhaustive and complete 

I shall not define a lun because new advances and applications of new criteria 
to new pioblems would probably soon render most definitions obsolete. Runs 
as used in statistics are best characterized by a philosophy and a technique rather 
than by the employment of any one specific device. What is always involved is 
the ordering of observations according to some characteristic and the resultant 
effect of this ordering on the ordering according to some other characteristic. 
For example, if the seats at a meeting of statisticians and engineers are numbered 
and occupied by m engineeis and n statisticians, then if we list the numbers of the 
occupied seats in ascending order and replace each number by E or S according 
as the seat is occupied by an engineer or statistician, we shall have a sequence of 
m + n elements, m E’s and n S's Thus, if m = 7 and n = 6, such a sequence 
might be 

EEESEESSSESSE . 

If we were interested in knowing how well engineers and statisticians are ac¬ 
quainted with one another, we should find it of interest to study the runs of E’s 
and S’s in this sequence. Any subsequence of consecutive E’s or S’s which can¬ 
not be enlarged is called a run. Thus in the example above there is a run of E’s 
of length 3, followed in order by a run of S’s of length 1, a run of E’s of length 2, a 
run of S’s of length 3, a run of E’s of length 1, a run of S’s of length 2, and a 
run of E’s of length 1. Runs of this kind are usually called runs of two kinds of 
elements. Naturally the characteristic according to which we order (in the 
example above, seat number) and the characteristic whose runs are observed 
{E or S) may be various. They ought in general to have a meaningful connec¬ 
tion. 

The order of observations has no value if it is known that the observations are 
independent and random from the same universe and one seeks to estimate a 
parameter of the universe Many of the statistical problems treated in the 
literature are of this character. In quality control of manufactured articles one 

1 Revised from an expository address delivered at a joint meeting of the Institute of 
Mathematical Statistics and the American Society of Mechanical Engineers at New York, 
May 29, 1943, at the invitation of t.he program committee 
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of the fundamental problems is to decide whether the observations are “random,” 
or in the language employed in this field, whether statistical control exists For 
this purpose indiscriminate pooling of data which suppresses the order charac¬ 
teristics of the observations represents a loss of valuable information. 

The algebra of runs of two kinds of elements is fairly elementary and most of 
the distribution problems involved have been solved Suppose an urn contains 
in white balls and n black balls, thoroughly mixed, and m -f n drawings are 

(m + n)! 


made without replacement. There are 


m\n\ 


different sequences of TF’s 


and B’a possible, and each sequence has the same probability. Let us find in 
how many ways the m elements W can be arranged to give k runs By a trick 
due to Euler, this is the coefficient of x m in the purely formal expansion of 


(x+x 2 +--. + x m f 


which is the same as the coefficient of x m in the formal expansion of 

(* + *' + *■+ (j-i-)* 

and is therefore __ ^ ^which is, of course, the combinatorial symbol for 

(”* - l ) 1 \ 

(m — k) i (k — 1)!/ ’ 

It is easy to see that the number of sequences of TF’s and B’a which have 2k 
runs of both kinds is 

2 (:=!)(*=:) 

and hence that the probability that U, the number of runs of both kinds, be 
2k is 

The details of this and other relevant derivations can be found in Wilks [1], 
Mood [2], Wald and Wolfowitz [3], and Stevens [12]. The formulae given there 
are of the type given above; e.g , for the probability that U — c Application 
to tests of significance usually requires formulae of the type which give the proba¬ 
bility that U < c. This causes some difficulty in application and raises a need 
for suitable tables Useful tables have been given by Swed and Eisenhart [4] 
and by P. S. Olmstead in an article by Mosteller [5], The latter table really 
deals with a special case of runs of two kinds of elements. 

The devices described above were systematically utilized by Mood [2] to give 
a valuable collection of formulae. A representative result is that the joint dis¬ 
tribution of the numbers of runs of length 1, 2, • • , p and all those of length 
greater than p is asymptotically normal, with means and covariance matrix 
given. 
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The results given by Mood are limited to a classification of runs into a finite 
number of classes, The author [6] has given a general result which permits 
weighting runs of all lengths. 

Closely allied to runs of two or more kinds of elements are runs from a bino¬ 
mial or multinomial population If the observations are classified into fc classes, 
designated by 1, 2, • ■ • , fc say, and each observation has a constant probability 
Pi of falling into the rth class (t = 1,2, • ■ • , fc) then a sequence of l observations 
all of which belong to the same class and which is preceded and followed by ob¬ 
servations which belong to another class (except, of course, when the sequence 
is at the beginning or at the end of the series) is called a run of length 1. If a 
coin, whether unbiassed or not, is tossed repeatedly, the runs of heads and tails 
are runs from a binomial population (i.e., fc = 2) and if the coin is unbiassed, 
Pi = Pi = i 

The algebra of these runs has been studied mainly by von Bortkiewicz [7], 
von Mises [8], Wishart and Hirshfeld 19], Cochran [10], and Mood [2], Runs 
from a binomial population (say) differ from runs of two kinds of elements in 
that m and n (defined above) are chance variables. If therefore, in general, a 
distribution formula valid for a fixed m and n be multiplied by the probability 


of this particular set of m and n ^ ^ P? P"^ and summed over m and n, 


the result will be the corresponding distribution formula for runs from a binomial 
population. Yon Bortkiewicz [7], Cochran [10] and Mood [2] derived the essen¬ 
tial parameters involved. Wishart and Hirshfeld [9] proved the asymptotic 
normality of the total number of runs from a binomial population, and these 
results were generalized by Mood [2]. 

Von Mises [8] proved that if N be the number of observations from a binomial 
population, the distribution of the number of runs of a length which is of the 
order of log N approaches the Poisson distribution with increasing N. 

Cochran [10], extending the work of Gold [11], made use of runs of this kind in 
order to study what they called “the persistence of weather”, i.e., whether dry 
months tend to follow dry months and wet months to follow wet months. In a 
long series of weather observations the months were classified as wbt or dry and 
a four-fold table constructed of the number of months falling into each of the 
following categories: 

(a) wet month following a wet month 

(b) wet month following a dry month 

(c) dry month following a wet month 

(d) dry month following a dry month. 

The chi-square test was applied to the four-fold table to test the null hypothesis 
that the probability of whether a month was wet or dry was independent of what 
its predecessor had been. 

Olmstead [13] has made use of a run which is very similar to that of a run from 
a binomial population, except that the sequence terminates whenever an obser¬ 
vation on a specified one of the two classes (a“failure”) is recorded. The author 
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[6] has used a run defined as a sequence of consecutive integers in a permutation 
of the first n integers to test whether two variates are independently distributed 
when nothing is known about their distribution functions except that they are 
continuous. The rank correlation coefficient is usually employed for this purpose. 

Of great importance in quality control of manufactured output are runs up and 
down. If, in any of the n\ equally likely (by hypothesis) permutations of the 
first n integers, we subtract each element from its successor and replace the result 
by + or — according as the difference is positive or negative, we get runs of -f 
signs and — signs, called respectively runs up and down. The usage of the term 
length varies; in this paper we shall say that the length of a run is the number of 
+ or — signs in it. This has the' advantage that then the sum of the lengths of 
all the runs is n — 1. (Most quality control literature, which follows Shewhart 
[14] and Kermack and McKendrick [15], defines the length of a run as one more 
than the number of + or — signs in it.) Thus, for example, the sequence 

3476512 


will appear as 


+ +-+ 

after the + and — signs have been inserted, and has an ascending run of length 
2, followed by a descending run of length 3, followed by an ascending run of 
length 1. 

The distributions associated with runs up and down in general present mathe¬ 
matical difficulties greater than those associated with distributions of runs of two 
kinds of elements and the results are far from complete. The asymptotic 
expectation of r P , the number of runs of length p, was given with great brevity 
by Fisher [16] and in detail by Kermack and McKendrick [15], and the exact 
result was supplied by Wallis and Moore [17]. The matrix of covariances among 
the runs of various lengths is being computed, and, it is hoped, will be available 
for publication shortly. As far as the author is aware, no explicit formulae 
giving the probability that r p = k or that r p < k are known. Some recursion 
formulae of limited usefulness are available. 

The author has recently obtained the asymptotic distributions of r„ , of 
r »i i r i>j, ■ ■ ■, r Pk jointly, and of related statistics. These are jointly normal. 
Hence certain quadratic forms in these variables have approximately the chi- 
square distribution. 

Anticipating somewhat the discussion to be given below, it may be mentioned 
here that the quadratic forms in certain of the r„ which Kermack and McKend¬ 
rick [15] use to test for randomness, do not have the chi-square distribution which 
Kermack and McKendrick imply to them. Wallis and Moore [17] first pointed 
out that these quadratic forms were not the proper chi-square statistics for good¬ 
ness of fit because of correlation among the r p . The author’s recent results 
show that these forms do not have the chi-square distribution. 
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2. Remarks on applications of runs. Let us now turn to statistical applica¬ 
tions of some of the runs described above. Suppose we have a sample of n 
random independent observations on one vanate and a similar sample of n 
observations on another variate. Suppose further that nothing is known a 
prion about the distribution of each except that both are continuous, and it is 
desired to test whether the two distributions are identical. This problem is of 
great practical importance and occurs frequently. In quality control of manu¬ 
factured output it may occur, for example, if we wish to test whether the output 
of two machines, tw r o workers, two different processes, or that from raw material 
obtained from two different sources, is the same. Naturally the problem not 
only of two, but in general, of a larger number of samples may arise. 

The solution proposed in [3] is as follows: Let the m + n observations be 
arranged in order of, say, ascending size, and let each observation be replaced by 
F of S according as it comes from the first or second sample. The total number 
U of runs in both F and S is the statistic to be used. Small values of U are the 
critical values for rejecting the hypothesis of identity of distributions. Thus in 
the example above of the seating of statisticians and engineers in the auditorium, 
a small value of U, which implies that the S (statisticians) and the E (engineers) 
each tend to bunch together, would be regarded as evidence that the statisticians 
and engineers present are not well acquainted with one another. 

The statistic U seems a not unreasonable one for the purpose. A discrepancy 
between the two distribution functions will make alternation of values of the two 
variates less frequent This idea was proved for large n in [3], where a gener¬ 
alized concept of statistical consistency is given 
On the other hand, the choice of U as a statistic is aibitraiy; other reasonable 
criteria can certainly be given (see, for example, Dixon [19]. In [3] it is shown 
that a criterion which had previously been proposed was not acceptable because 
the statistic was not consistent, but nevertheless consistency is a property en¬ 
joyed by many statistics and constitutes only a partial check on the arbitrariness 
of choice. An "abnormally” long run m one oi both variates which would be 
regarded by “common sense” as an indication that the hypothesis ought to be 
rejected, might be accompanied by a large number of runs of length one which 
might make the value of U not critically low. Some writers suggest that the 
presence of a long run of sufficient length be regarded as indicating rejection of 
the null hypothesis. In that case, if most of the runs were comparatively long, 
while none were critically long, the null hypothesis would not be rejected under 
this criterion, but the value of U would be small A step has been made in the 
direction of setting-up a criterion for the choice of statistic ([6]) so as to remove 
this arbitrariness. This involves an extension of the likelihood ratio principle. 
It must be remembered, however, that almost any criterion will fail to reject 
some sequence which, it seems intuitively, ought to be rejected. All statistical 
inference involves risks of error, one object of the science of statistics is to mini¬ 
mize these risks. 

Another possible test for the problem of two samples is to compare the num- 
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bers of runs of various lengths with their expected numbers by the proper chi- 
square (Caution: the correlation among the variates must be taken into account). 
The author [6] has developed another test from an extension of the likelihood 
ratio. 

Whenever a uniformly most powerful test does not exist, and this is the case 
in most non-parametric problems, it is not usually possible to say that one test 
is more powerful than another, unless the set of alternatives is sufficiently de¬ 
limited. The power function is then the ultimate criterion for the choice of 
statistic. 

If a sequence of n unequal numbers be given, a very important question is to 
decide whether the sequence is a “random” one; if it is and the sequence repre¬ 
sents measurements on a characteristic of successive products of some manufac¬ 
turing process, the latter is said to be in statistical control. A precise mathe¬ 
matical formulation can be given to this statement about randomness. Let 
Xi , X 2 , ■ • ■ , X„ be chance variables, and let x L , z 5 , ■ • ■ , x„ be a set of random 
observations on the corresponding variables. To test whether xi , xi, • • ■ , x n 
is a “random” sequence means to test the hypothesis that X 1 , X 2 , ■ • • , X n 
are independently distributed and have identical distribution functions. This is 
in general a difficult problem, chiefly because of the large class of alternatives to 
the null hypothesis 

Since the null hypothesis does not specify the distribution functions but only 
asserts their identity, the tests most generally sought have been such that their 
size is independent of the unknown (but identical for all the chance variables) 
distribution function. Certain reasonable procedures have been based on the 
numbers and lengths of runs up and down in the sequence. 

R. A. Fisher [16] suggested doing this, but gave no indication as to what 
statistic was to be used. Kermack and McKendrick [15] and Walks and Moore 
[17] propose the following procedure, the former writers implicitly and the latter 
explicitly: Let 

n—1 

r' P = Et 

*-P 

and denote by x the expectation of the general chance variable x. The proposed 
statistic is 

(r, - f,) 2 j (/, - r v y 
f, r v 

with the critical region the upper tail. Wallis and Moore recommend p = 3 and 
approximate the distribution by empirical methods. As we have seen above, 
Kermack and McKendrick err in Ascribing to the statistic the chi-square distri¬ 
bution. 

The criticism has been made by Olmstead [19] that this statistic is insensitive 
to pronounced trends in the data. This is correct, and had been pointed out 
earlier in [17], where the prior removal of a trend is recommended. Since one of 
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the important problems of quality control is detection of a trend, this would limit 
the usefulness of the statistic for quality control purposes. 

It happens frequently when a new rank statistic has been proposed for testing 
a non-parametric hypothesis such as that of “randomness” above, that critics of 
the proposed criterion construct sequences which, they say, appealing to “ordi¬ 
nary common sense”, any reasonable statistic ought to place in the region of 
rejection for almost any size of test. They then cheerfully point to the fact that 
the proposed statistic does not act in this reasonable fashion. A few remarks 
about this may not be amiss. 

A test for, say, “randomness”, which is to be made on the sequence of ranks, is 
really a numbering of the n ! permutations of, say, the first n integers, according 
to the order in which they ought to be taken into the critical region in order to 
make the latter of any prescribed size. This numbering could even be done by 
tabulating, for different n, the various sequences in their proper order. Aside 
from the obvious practical obstacles to such a tabulation, there would soon arise 
the difficulty that, after the “obvious” sequences are assigned their places the 
investigator would have difficulty in assigning to mo6t of the remainder an order¬ 
ing according to the degree in which they may be held to “contradict” the null 
hypothesis. Resort is therefore made to a statistic which can be given as an 
analytic expression in the ranks. Because of the inadequacies of the theory the 
formula is often chosen by analogy with a similar formula in classical statistics. 
Difficulties may arise because of this. 

Let us examine for a moment this intuitive notion of reasonableness. Most 
people, and even most statisticians, would agree that the sequence of the first n 
integers in ascending order is an indication of non-randomness. The basis for 
this notion is an intuitive conception of an alternative to the null hypothesis for 
which this sequence is very probable. The fact is, however, that if we admit all 
alternatives to the hypothesis of randomness, for any sequence of ranks whatever 
there exist infinitely many alternatives which assign to this sequence a probability 
of one. 

It seems to us that the difficulty can be met to a large extent by delimiting the 
class of distributions which constitute the alternatives to the null hypothesis, 
and by assigning to the admissible alternatives a weight function which measures 
the importance of the various alternatives (e.g., the financial loss caused by each). 
A profound treatment of this subject for the parametric case has been given by 
Wald [20]. This method has also the great merit that it removes the need for a 
choice of size of the region of rejection. 

In the control of the quality of mass production output one of the outstanding 
problems is to decide on the basis of a sequence of observations on the product 
whether the production process is in statistical control. Shewhart and his 
school of industrial statisticians base many of their tests on .the sequence of 
ranks. On the basis of their experience they find .that the causes which most 
often lead to a breakdown of statistical control are such as to cause shifts up and 
down in the level of the observations or trends in the observations. To detect 
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the former they have devised the technique of runs above and below the median 
and to detect the latter they use runs up and down. Runs above and below the 
median may be described briefly as follows: The 2m + 1 (odd number) of ob¬ 
servations furnish a sequence of rankings from 1 to 2m + 1- The elements 1 to 
m are considered to be elements of one kind and the elements m + 2 to 2m + 1 
elements of another kind. We then have a special case of runs of two kinds of 
elements. Limitations of space prevent the presentation of more detail or a 
description of the ingenious scheme by which both kinds of runs are graphically 
exhibited. The reader is referred to [14], [5], and [21], among others. The tests 
used in the industrial applications are not always explicitly stated, nor do they 
always seem to be the same. The most common involve comparison of runs of 
various lengths with their expected number or else are based on the presence of 
abnormally long runs. 

A pretty application of the theory may be found in Campbell [21]. The cor¬ 
rosion of a copper plate was determined by a delicate mechanism which measured 
the electrical resistance in various places on the plate The rectangular plate 
was divided by rows and columns into forty small rectangles in each of which a 
measurement was made. The readings were made in each column in successive 
order from one end to the other, and the columns were also measured in succes¬ 
sive order from one edge to the other. The observations, when examined for 
runs above and below the median and runs up and down, indicated something 
amiss (“absence of statistical control”) Two causes were considered possible: 

(a) variations, over the plate, in the corrosion of the copper; 

(b) malfunctioning of the delicate measuring apparatus. 

The runs obtained by arranging the observations in successive order according 
to positions on the plate might be expected to be associated with (a), while the 
runs obtained by arranging the observations in temporal order might be expected 
to be associated with (b). The object was therefore to separate the two order¬ 
ings and this was done as follows: The rectangles were numbered 1 to 40 in the 
order in which the first observations had been made and a random permutation 
of this sequence was used to indicate the order in which the next set of observa¬ 
tions was to be made. The second set was then ordered in two different ways, 
first according to the temporal order of the observations, and second according 
to the original ordering by positions. The runs above and below the median and 
the runs up and down, in the first ordering of the second set of observations gave 
evidence of a lack of statistical control, while those in the second ordering of the 
same set did not. An investigation located the trouble in the measuring appa¬ 
ratus. 

3. Conclusion. The manifold achievements of quality control as it is prac¬ 
ticed at present point to the desirability of still further development of theory 
and practice. We conclude this paper by suggesting a few directions in which 
the theory of runs could develop and be of greater assistance in quality control. 

(1) The kinds of runs and the statistics used for making decisions in a produc¬ 
tion process should be chosen on the basis of the kind of deviations from the 
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state of statistical control which the engineers consider most likely to occur, 
It is very likely that different production processes may require different sta¬ 
tistical procedures. 

(2) General distribution theorems should be developed, power functions ob¬ 
tained, and the correlations between different tests investigated. 

(3) The application of the weight function idea of minimizing financial losses 
should be considered. 

In these developments both engineers and mathematical statisticians would 
have important and complementary roles. The tempo of progress will depend 
in large part on the cooperation between them. 
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THE ACCURACY OF SAMPLING METHODS IN ECOLOGY 

By Paul G. Hoel 
University of California at Los Angeles 

1. Introduction. For a number of years journals on ecology have contained 
articles on sampling techniques for estimating the distribution of common species 
of plants in various-regions. Although much experimental work has been done 
on this problem and although the problem is essentially statistical in nature, no 
theoretical work of any consequence seems to have been attempted. This paper 
considers the question of the relative accuracy of common sampling methods 
from a theoretical point of view by means of geometrical probability and statisti¬ 
cal distribution theory. 

There are three common methods of sampling used by ecologists. They are 
designated by the names of coverage, abundance, and frequency. For each of 
these methods of sampling, there are two common choices of sampling unit, 
namely, the quadrat and the transect. By the coverage of a species in a region 
is meant the total area covered by the projection on the ground of the crowns of 
the plants of this species. By abundance is meant the total number of plants 
of this species in the legion By frequency is meant the number of sampling 
units in the region in which at least one plant of the species occurs. A quadrat 
is a sampling unit in the form of a square, usually several yards on a side. A 
transect is a sampling unit in the form of a straight line, coverage in this case 
being the length of line covered by the projection of the crowns. 

In this paper it will be assumed that plants possess circular crowns Further, 
it will be assumed that plant species distribute themselves at random in the 
region to be sampled. This is not necessarily the case, since there is often a 
tendency for plants of a given species to distribute themselves at random or 
otherwise in groups rather than as single plants. However, if sampling units 
are somewhat comparable in size, the relative accuracy of these methods of 
sampling based on a random distribution would be expected to hold fairly well 
for distributions somewhat removed from this ideal situation. Further, by the 
proper choice of sampling unit size, some non-random distributions behave very 
much as though they were random. 

The accuracy of a sampling method may be measured by the variance of the 
estimate of the quantity which is of interest. Here interest will be centered 
on the total coverage of a given species in the region being sampled. Thus, two 
sampling methods will be said to be equally accurate for coverage if they produce 
equal variances for the estimate of total coverage. 

The quadrat unit of sampling will be considered first for the three methods 
of sampling, after which the transect unit will be considered. 
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2. Quadrat coverage. Let the region to be sampled be a square B units on 
a side. Let there be n quadrats, each a square A units on a side, distributed 
at random in the region. Finally, let the total number of plants of the species 
in question in the region be N, with the distribution of the radius of their crowns 
given by a frequency function /(r) whose explicit form will be specified later. 

First, consider a single plant of radius r and a single quadrat. The problem 
is to determine the variance of a, the area of that part of the plant lying in the 
quadrat. Now the probability that this plant will be found in any particular 
part of the region is obtained by treating the plant as a circle of radius r which 
is thrown at random in the region and then applying geometrical probability 
to the position of the center of the circle. Thus, considering only those situations 
when the center of the circle lies in the region, the probability that the circle 
will cover an area of at least a > }irr units of the quadrat is given by the ratio 
of the area of the subregion inside the quadrat whose boundary is the locus of 
centers of circles of radius r which have precisely a units of their area inside 
the quadrat, to the area of the region. Probabilities of this type may be treated 
as functions of a. The expressions below for such probabilities follow directly 
from Fig. 1, which displays one corner of the quadrat. 

Pi[o < area < irr s ] = 4*S[/il 2 , a > ^irr* 

P 2 [0 < area < a] = 4Ss/B 2 , a < \xr% 

P.[a = irr 2 ] = (A - 2rf/B\ 
p<[ a = 0 ] - Pi. 



Now 

( 2 ) 


Si = (A — r)(r — z) — £ y dx, 
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where y is the ordinate of curve C±. Likewise 

(3) Si = A(r + z) — z 2 + \irr 2 -f f ‘ + ^ r ’ y' dx', 

Jo 

where y' is the ordinate of curve C 2 with respect to the primed axes and 0 is 
negative. Using the formula for the area of a segment of a circle, the equation 
of Ci is easily found to be 

(4) x\/r 2 — x- + r 2 sin -1 - + yy/r 2 - y 2 + r 2 sin" 1 y - = a, x 2 + y 2 > r 2 

T T 


xy/r 2 — x 1 + r 2 sin 1 - + yy/r 2 — y 2 + r 2 sin 1 - + 2 xy + f rr 2 = 2o, 

(5) r T‘ 

x 2 + y 2 < r 2 

where the value of a is given in terms of z by 

(6) zy/r 2 — z 2 + r 2 sm _l - + = a. 

r 2 


The equation of C 2 is given by (5) with z negative These equations do not 
permit the solution of y in terms of x; however, they can be thrown into the 
following parametric form with t as parameter: 


(40 


(50 


. jl, 1 • 

x =* r sin < - + - cos 


i 1 
y = r sin < - — - cos 


- 1 r a/r 2 - 1 1 
L sm t J 

a [ a/r 2 - t l\ 

L sinf J/' 


x = r sin 


t 1 _ t f 2o Ir 2 — t/2 + cos t — t 
2 + 2 COS ' --- 


L 


1 -f- sin t 


t 1 -jf 2o/r 2 — ir/2 + cos i — t 

{- —TTisn— 


]}■ 

]}• 


Since a may be treated as a parameter, equations (4) and (5), and hence (40 
and (50, represent a system of curves Ci and C 2 . Unfortunately, equations 
(40 and (50 are not convenient for integration purposes either, but they are 
convenient for numerical work. This system of curves can be approximated 
satisfactorily by means of.simpler curves. One set of such approximating 
curves is the following system of circles: 

(7) (x — r) 2 + (y - r) 2 = (r - a) 2 , z > 0 

(8) (x — vV — a 2 ) 2 + (y — VV 2 — a 2 ) 2 = (—a + a / r 2 — a 2 ) 2 , a < 0. 


Although inequalities may be obtained between the approximating and true 
curves, these are of little value for determining the accuracy of essential moments 
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obtained by using these approximating curves; therefore the accuracy of these 
approximating curves will be judged empirically by means of Fig, 2 m which 
the true curves are plotted by means of (4') and (5') for z = .6, .3, 0, — .3, —.6, 
— 9, of r with solid lines and the approximating circles (7) and (8) with broken 
lines. Although the circles appear to fit poorly for relatively large positive 
values of z, this is not serious because those values occur increasingly less often 
than other values of z for a random circle and because the use of these circles 
is confined to the rate of change of area bounded by these curves and the lines 
x = r and y = r. Since the true curves are approaching the circles with de¬ 
creasing positive z, their rate of change of area would not differ much from that 
for the circles even though the circles include larger areas for a given z. In the 
paragraph following (11), further evidence will be presented to show that for the 
computation of the first two moments of o, these' curves give a good approxima¬ 
tion. 



For the purpose of obtaining the variance of a, consider the expected value of 
a\ Since the variable a may be thought of as the sum of three variables which 
assume only the values 0, irr, and 0 < a < r r 2 , from (1) it follows that 

I’Jar 2 


E(a k ) = (*0* 


/•Tf 4 

+ ^ afi(a) da + J a%(a) da, 


where fi(a) and f 2 (a) are the frequency functions for z > 0 and z < 0 respec¬ 
tively. Now since 


and 


f TP 

Pita < area < irr 2 ] = 1 /i(a) da, 

Ja 

P 2(0 < area < a] = / 2 (a) da, 
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it follows from (1), (2), and (3) that 

fi(a)da= —dPi = -4^ = i-[A - r + | j%dx] dz, 

and 


Ma) da = dP 2 = 4 § “ gi 


A - 2z + 


dz 


I 


,-z+V r« 


2 /' dx' dz 


i 

I 


Using the approximating curves (7) and (8), these integrals become: 


V dx = r(r - z) - | (r - z) 2 , 

(.-a+v/r^-z* / _\ 

l y'd^ = (i - y (-* + v^r^y. 

Hence, 

A(“) da = - 2r (l ~ |) - £*]*» 

and 

/,<«) * - ^ - IK - 2(1 - i)(2v^7= - ^r^)]* 


Hence, 


E(«‘) - (»>•’)* ! -b.— +wl a “[ A ~ t'i 1 - |) - I s ]* 

+ | - 2, -2(j - r)(2V?^ - ^=)]*. 


Substituting the value of a from (6), standard integrals give the following 
values for k = 1 and k = 2: 

(9) E(a)-^ [(£)’ + . 13 ], 

(10) £(«') - - 1.15(7) + .«], 

where certain constants involving ;r have been evaluated to two decimals. 

If circles with centers outside the region but overlapping the region were 
also measured, then geometrical probability would give the following value 
for £/(a): 
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Since in (1) only circles with centers inside the region are assumed measured, 
E(a) will be only very slightly larger than this last value; consequently the 
approximate result in (9) is only slightly in error. For a quadrat ten yards on 
a side and plants averaging two yards in diameter, the error is in the neighbor¬ 
hood of one tenth of one percent; consequently formula (10) may be expected 
to be quite accurate as well. Another approximating system of curves lying 
largely on the opposite side of the true curves from the circles gave formula 
(10) with .46 replaced by .26, both of which have a negligible effect on E{d l ) 
for ordinary applications. 

Formula (10) was derived on the assumption that the same circle was thrown 
repeatedly at random in the region. Consider now the situation when the 
circle varies in size according to the frequency function f(r). Treating a and r 
as two statistical variables, their joint frequency function may be expressed as: 

/(a, r) = f(r)f(a \ r ), 

where /(a | r) is the frequency function of a when r has the fixed value r. Letting 
&(a k ) represent the expected value of a' when r is permitted to vary according 
to/(r), 


&(a) = j J a k f(a, r) da dr 


- J f(r ) J a k J{a | r) da dr 
= / f(r)E(a k ) dr, 


where all integrals are taken over the regions for which a and r are defined. 
Consequently, from (10) and (11) 

£(a 2 ) = — 1.15Art -|- .46r*J, 

and 


( 12 ) &(a) = ^A 2 Vl , 

where the v’s represent moments of r. Hence the variance of a is given by: 

(13) <rl = ^ 2 {A 2 Vi — l.lSArs + .46 rd — A i v\/B '*]. 

Finally, let there be n quadrats, N circles whose radii vary according to f(r), 
and let the total area of quadrat covered by the N circles be denoted by s. 
Then 

(14) 


S(s) = nN&( a), 
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and 

(15) Oj = nNol 

The purpose of measuring s is to use it to obtain an estimate of T, the total area 
of the N circles. But 

(16) T = = Nirvi . 

Substituting the value of n from (12) and using (14), 

T = B%(s)/nA\ 

Hence an estimate of T will be given by 

(17) Ti = B\/nA\ 

Using (15) and (13), the variance of this estimate will be given by 

/1oN 2 4fl*2VT 11 A V * \ Ad v * A* 2~1 

(18) ^ = ~kat r ~ L 14 i + - 46 a> ~ w V2 i • 

3. Quadrat abundance. In this method the sampler merely counts the num¬ 
ber of plants of the given species in each quadrat. Although this method was 
designed to estimate the total number of plants, it may be adapted to estimate 
total coverage as well. Since it is the practice to count a plant as lying in the 
quadrat only if its stem is in the quadrat, the probability that this event will occur 
is given by: 

(19) P tt = A*/B\ 

Since there are n quadrats and N circles, the number of circles with centers 
lying in quadrats, which will be denoted by s, will follow the binomial distribu¬ 
tion; hence 

(20) 5(a) = nNP Q , 

and 

(21) <s\ = nNP q { 1 - P t ). 

From (16) and (20) it follows that 

T = Trvif&(js)/nP q , 

Therefore an estimate of T will be given by 

( 22 ) T q = TrB 2 m s s/nA 2 , 

where m 2 is a sample estimate of v 2 obtained by measuring the diameters of k 
plants and calculating their mean area. Since m 2 and s are independent, a 
standard formula for the variance of a product of two independent variables 
may be applied to give 

4, = + 
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But 


2 _ Vi — Vt 

<r„, , 

and 


_ V\ — Vl . 2 

©(WI2) — - - - + Vi . 


Consequently, with the aid of (19), (20), and (21) 
(NB 2 - AT 2 


(23) 


2 

1 


+ 


v« 


2 " 

V 2 


fc 


+ y I"* “ 


4. Quadrat frequency. In this method the sampler records the number of 
quadrats observed and the number of those quadrats which contained at least 
one plant of the given species. Given N plants, the probability p that at least 
one of them will be found in a given quadrat is given by 

v = 1 - a - py, 

where P, is given in (19). For n quadrats the expected number of quadrats 
in which at least one plant will be found is therefore np. Letting w represent 
the number of such quadrats, 

&( W ) = »[i - (i - pyi 

Solving for N, 

N = log£l - logU - B a ]. 

Consequently, from (16) an estimate of T will be given by 

wtm log [l - -]/ log [1 - PJ. 

Neither the mean nor the variance of Ts will exist because Ta is a discrete variable 
which becomes infinite for w = n. Unless the density of the species is very low, 
values of w near n will 'occur quite often and hence cause T s to vary widely. 
Consequently the frequency method will be inferior to the abundance method, 
except when the mean density is low, in which case the abundance method is 
practically as easy to apply. Because the frequency method is obviously inferior 
to the abundance method, it will not be considered further here. 


6. Transect coverage. In this type of sampling a line is laid down and the 
length of line covered by a plant of the species in question is recorded. Let 
there be n such lines, each L units in length. 

If a circle of radius r is thrown at random in the region, it will cross a line 
only if its center lies within the subregion, indicated in Fig. 3, composed of a 
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rectangle of width 2r and length L with semi-circular ends. From this figure it 
is clear that the probability of the circle intersecting some positive length less 
than z of the line is given by four times the shaded area s 3 , divided by the area 
of the region. From this same diagram the following equations of the indicated 
curves result: 


ft: 

(• - f) + 

f <J!< f +r 

ft: 

y = VV 2 — z 2 /4, 


ft: 

(® - § + z ) + y* = r 2 , 

^-\ <x< r 



Applying geometrical probability, 

Pi [0 < intercepted length < z] = 4 = I /(z) 

x) Jo 


dz, 


where /(z) is the frequency function for z. But 
Si = ^ [r — Vr* — z 2 /4] + | vV — z 2 /4 + ^ 


Standard integrals give 




cfc. 




\/r 2 — z 2 /4] + fzv/r 2 — z 2 /4 + ^ sin 1 



/(z) dz = dPi = 


4 f Sr 2 + Lz - 3z 2 l J 
l 2 \ 8 V r* - z 2 /4 / 


Consequently, 
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From this relation the following moments are readily obtained: 

E(z) = rr t L/B i 

E(z‘) = [Yir 3 - rr'\/B\ 

For variable r these formulas become: 

&(z) = irviL/B 1 , 

&( z ) = [hP-Lv3 — irvd/B*, 

a t ~ P^Ly s — itvi[/B 2 — 7r 2 v\ L?/B\ 

Let £ denote total z for N circles and n quadrats, then 

(24) S(£) = nN-rviL/B 2 , 
and 

fff = nN ([ V'Li '3 — irv<]/B 2 — tt 2 y 2 L 2 /B* J. 

From (16) and (24) 

T = R%{Z)/nL. 

Hence an estimate of T will be given by 

(25) T t = B\/nL, 
and its variance “will be given by 

(25) = •— [^g-Lvi — 7T^4] — tt 2 y 2 j> . 

6 . Transect abundance. Since the probability, P t , of a circle of radius r 
intersecting a line of length L is the area of the band with semi-circular ends 
indicated in Fig. 3, divided by the area of the region, 

Pt = [2rL + tv 2 ]/B 2 . 

Hence, letting s represent the total number of intersections, as in the case of 
quadrat abundance, 

E{s) = nNP t , 

E(s 1 ) = nNP,(l — P,) + n 2 N 2 P], 

(^^ «(*) = nN[2Lvi -(- Tryj] /B 2 , 

^(® ^ = £4 {B 2 [2Lvi -f Try 2 ] + [nN — l][4L 2 y 2 + irLv s + r 2 y 4 ]}. 

For simplicity of formula if nN — 1 is replaced by nN, the variance of s becomes 

(28) V * = l B *[ 2Lvi + 7 r " 2 l 

+ nAT[4L (y 2 — y 2 ) 4 - 47rL(y 8 — yiy 2 ) -J- ir 2 (y 4 — y 2 )]). 
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From (16) and (27) 


iri/ 2 B 2 d>(s) 
n[2Lvi + vvi ] ’ 


Hence an estimate of T will be given by 


6 n[rr •+ 2La] ’ 

where a is an estimate of v\fv 2 . In order to obtain a satisfactory estimate of 
vi/Vi , data for the distribution of diameters of common California shrubs were 
analyzed. It was found that Pearson’s type three curve gave an excellent fit. 
Since the moments of this type distribution are given by 

(29) = p m n [i + j a 

1-1 

where p is the mean and V is the coefficient of variation, a/p, then vi/vi = 1 /pB, 
where d = 1 + V 2 , and the above estimate becomes 

(30) rrb ]- 


where <p = ird[f — p]/[rpO + 2L] and where 1/f is chosen as an estimate of 
1 /p. Since f will be approximately normally distributed for samples consider¬ 
ably smaller than those usually taken to find f, assume that it is normally distri¬ 
buted with mean zero and variance <rW /k[irp0 + 2Lf. Since L is large relative 
to a and since k will usually exceed twenty-five, this variance is very small, 
and hence the probability of ip exceeding one numerically is extremely small. 
Although the value ip — — 1 is theoretically possible on the normality assumption, 
such a value would not permit the existence of either the mean or variance of 
1/[1 + ¥>]• However, if ip is restricted to a range of, say, ten standard deviations 
about zero, then | <p | < 1 for ordinary conditions and the variance will exist. 
Further, because ip assumes such small values, with this finite range the variance 
of 1/[1 + <p\ is the same as the variance of <p itself if higher powers in this variance 
are neglected. Since a and ip are independent, the same product formula that 
was used for quadrat abundance may be employed here, together with the 
various approximations indicated above, to yield 


2 

a Ti 


= (avlj {[fv {2Lvi + 


\2L + irpO 

+ 4L/(v 2 — v\) + 4irL(vj — v 2 vt) + ^{va — vl)^j 
4 L 2 F 2 1 , 4Z 2 V 


1 + 


fc(2L + irp0) 2 J + fc(2L + t vpd) 1 


[2Lvi “f“ TTPl\ 


(31) 
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7. Comparison of methods. Formulas (18) and (23) may be compared for 
relative accuracy of these two quadrat methods of measuring coverage. For¬ 
mulas (26) and (31) may be compared for relative accuracy of these two transect 
methods of measuring coverage. Finally, formulas (18) and (26), and formulas 
(23) and (31), may be compared to determine what length transect will give the 
same accuracy as a quadrat of given size All such comparisons will necessarily 
have to be done numerically by considering typical values for the parameters 
involved. The moments occurring in these formulas are expressible by means 
of (29) in terms of p and V if the form of f(r) is that assumed here. For the 
data analyzed to determine f(r ) it was found that V was approximately 1/3. 
These numerical comparisons will not be made here. 

The question of which type of sampling method should be employed now 
becomes a question of balancing relative ease or cost of sampling against size 
samples needed to produce equivalent accuracy as determined by means of 
these formulas. If total frequency is desired rather than total coverage, these 
formulas may be altered to handle this situation as well. 
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Readers are invited to submit to the Secretary of the Institute news items of general interest 

Personal Items 

Dr. Charles C Wagner has been named Assistant Dean of the School of Liberal 
Arts at the Pennsylvania State College 
Miss Ruth E. Jolliffe has taken a position in the Graphic Analysis Department 
of Bell Airciaft Corporation. 

Mr. H. F. Ilebley has been appointed Director of Research for the Pittsburgh 
Coal Company 

Lt F. W Dresch, USNR, U. S. Naval Proving Ground, Dahlgren, Virginia, 
has been promoted to the lank of Lieutenant Commander. 

Mi. George F Mayer is a Sergeant in the United States Army and is stationed 
at Fort Lewis, Washington 

Captain A. C Cohen, Jr. of Picatinny Arsenal has been promoted to the rank 
of Major. 


New Members 

The following persons have been elected to membership m the Institute: 

Bassford, Horace R. BA (Trinity Coll.) Actuary, Metropolitan Life Insurance Co , 1 
Madison Ave., New York, N Y 

Benson, Kathryn E. MS (Washington) Teaching Asst., Uni v of Calif , Berkeley, Calif. 

Blackadar, Walter L. B A (McMaster) Aaso Actuary, Equitable Life Assurance Society, 
393 Seventh Ave., New York, N Y. 

Buros, Asso. Prof. Oscar K. MA (Columbia) Rutgers Umv (on leave), Captain, Signal 
Corps, A U S 301 S, Cowthouse Rd , Arlington, Va 

Clinedlnst, William O. M.E (Carnegie Inst Tech ) Eng , National Tube Co , Prick 
Bldg , Pittsburgh, Pa 

Curry, Prof. Haskell B. Ph D. (Gottingen) Penna. State Coll., State College, Pa , 670S 
N Sixth Si., Philadelphia, Pa 

Dlx, Margaret J. M.A (Rice Institute) Sec., Univ of Calif , Berkeley, Calif 

Groth, Alton O. M.S (Iowa) Asst Actuary, Equitable Life Insurance Co of Iowa, 
Des Moines, Iowa. 

Gurland, John MA (Toronto) Instr , Univ of Toronto, Toronto, Canada 97 Metcalfe 
St., Ottawa 

Humm, Doncaster G. Ph.D (Southern California) 1203 Commercial Exchange Bldg , 
416 W. Eighth St., Los Angeles, Calif 

Jahn, Fred S. M.S (Florida) General Manager, New Plastic Corp , 1017 N Sycamore, 
Hollywood, Calif 

Jemlng, Joseph M A. (Columbia) Captain, Army Air Corps SOW Valentine Ai>e,, New 
York, N- Y. 

Kavanagh, Arthur J. Ph.B. (Yale) Physicist, Spencer Lens Co , Buffalo, N Y. 19 Boat 

St. 

Kennedy, Evelyn M. M A, (Cincinnati) Industrial Economist, War Production Board, 
Washington, D C. Hb% FanmontSl , NW 

Lehmann, Eric L. M A (California) Asso., Umv. of Calif., Berkeley, Calif. 3514 Pied¬ 
mont Ave. 
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Lew, Edward A. M.A. (Columbia) Asst. Actuary, Metropolitan Life Insurance Co., 1 
Madison Ave , New York, N Y. 

Murphy, Ray D. A,B (Harvard) Vice Pres, and Actuary, Equitable Life Assurance 
Society, New York, N. Y 23 Godfrey Rd,, Upper Monlcla.tr , N. J. 

Myers, James E, A B (Michigan) Leader-Statistical Analysis Group, Naval Res. Lab., 
Washington, D C 3014 Nichols Ave., SE. 

O’Connor, Harry W. M.B.A. (Harvard) Stat , Sperry Gyroscope Co Inc , Brooklyn, 
N. Y 37 Meadow Woods Rd , Great Neck. 

Painter, Frank M., Jr. M.B.A. (Harvard) Statistics Supervisor, Sperry Gyroscope Co , 
Brooklyn, N. Y. 843 Blind St 

Salklnd, William M.B A. ('Chicago) Asso Stat, U. S, Dept, of Agnc., Washington, D C 

$149 K St , NW. 

Simon, Leon G„ Pension Consultant 225 W. 34 St., New York, N Y 

Stewart, Oscar F. Statistics Supervisor, Sperry Gyroscope Co , Brooklyn, N. Y 

Tucker, Ledyard R. B.S (Colorado) Res Asso , Umv. of Chicago, Chicago, Ill 51,55 
Greenwood Ave. 

Bllman, Joseph L. B A. (Buffalo) Teaching Fellow, Mass Inst, of Tech., Cambridge, 
Mass. S97 Jefferson Ave., Buffalo, N Y 


REPORT ON THE WASHINGTON MEETING OF THE INSTITUTE 

The fifteenth meeting of the Institute of Mathematical Statistics was held at 
George Washington University, June 17-19,1943 About 200 persons including 
the following sixty-one members of the Institute attended one or more of the 
three evening sessions: 

T. W. Anderson, Jorge Arias, R O Been, H R. Beilinson, B M. Bennett, Richard 
Berger, Joseph Berkson, Felix Bernstein, Archie Blake, Dorothy S. Brady, W. G Cochran, 
J. B, Coleman, Gertrude Cox, J. H Curtiss, G B. Dantzig, Bessc B Day, Robert Dorfman, 
H. F. Dorn, W. F. Elkin, W. D Evans, R. H Fadner, L, R Frankcl.M A Girslnck, Harry 
H, Goode, C, H Graves, T. N. Greville, F E, Grubbs, Louis Guttman, Morris H Hansen, 
W. A. Hendricks, W. N, Hurwitz, Walter Jacobs, Rachel M, Jenss, A. J King, G B. King, 
Lila F. Knudsen, H S. Konijn, Solomon Kullback, H G Landau, J E Lieberman, W G. 
Madow, Sophie Marcuse, J.,W Mauchly, A.M. Mood, Harold Nisselson, Monroe L. Norden, 
H. W Norton, A. C. Rosander, David Rosenblatt, P. J Rulon, Marion Sandomire, W A. 
Shelton, Harry Shulman, J. H Smith, G W Snedecor, F. F Stephan, Alice Sternberg, 
Benjamin Tepping, J W. Tukey, C, R M, Tuttle, F M. Weida. 

The following program, arranged by Dr. W. G. Madow, was held: 

THURSDAY, JUNE 17 AT 8:00 PM. 

APPLICATIONS OF SAMPLING THEORY 
Chairman, Professor Frank M Weida, George Washington University 

1. Some Recent Developments in the Application of Sampling Theory m Agriculture 
Arnold J King, Iowa State College and Department of Agriculture; Walter A. 
Hendricks, North Carolina State College and Department of Agriculture 

2. The Relative Efficiency of Block Samples m Housing Surveys 
Lester R, Frankel and William J, Cobb, Bureau of the Census 

S. The Optimum Sue of Sampling Units 

Dorothy Cruden and Alice Sternberg, Bureau of the Census 
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FRIDAY, JUNE 18 AT 8:00 P.M. 

RECENT DEVELOPMENTS IN STATISTICAL THEORY 
Chairman, Professor George W. Snedecor, Iowa State College 

1 On Some Recent Developments in Sampling Theory 

Morns H. Hansen, William FT. Hurwitz, and William G. Madow, Bureau of the 
Census and Office of Price Administration 
2. On ihe Variance of Estimates Arising from Stratified Samples 
Frederick F Stephan, War Manpower Commission 
3 Statistical Techniques for the Comparison of Different Scales of Measurement 
William G. Cochran, Iowa State College 

4. Adjustments for Differential Refusal Rates in Samples of Human Populations 
Jerome Cornfield, Bureau of Labor Statistics 

5 On ihe Verification of Weathei Forecasts 
Horace W Norton, Weather Bureau 

SATURDAY, JUNE 19 AT 8.00 P.M. 

SOME PROBLEMS IN STATISTICS 
Chairman, Colonel Leslie E Simon, War Department 

1, The Application of Statistical Methods in Acceptance Inspection 
Harold Beilinson, War Department 

2. The Distribution of the Radial Standard Deviation 
Captain Frank E. Grubbs, War Department 

3 Some Results m Tests of Randomness 

M. A Girshick, Department of Agriculture 

4 Corrections for Groupings 

John H Smith, Bureau of Labor Statistics 

5. On Group Blood Testing 

Robert E. Dorfman, Office of Price Administration 

Edwin G. Olds, 
Secretary 


REPORT ON THE FIRST MEETING OF THE PITTSBURGH 
CHAPTER OF THE INSTITUTE 

The first' meeting of the Pittsburgh Chapter of the Institute of Mathematical 
Statistics was held at Carnegie Institute of Technology on Saturday, June 19, 
1943. Thirty-six persons attended the meeting, including the following ten 
members of the Institute: 

Shirley Bernstein, M A Brumbaugh, Karl Fetters, H, J Hand, G. E. Niver, F. G. 
NorriB, E G. Olds, R. F. Passano, E M. Schrock, R, W. Shephard. 

Morning and afternoon sessions were devoted to a round-table discussion of 
present industrial uses of statistical methods Mr. R. F. Passano, Bethlehem 
Steel Co., led the discussion Mr. F. G. Norris, Wheeling Steel Corp., acted as 
chairman of the sessions. 
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The Pittsburgh Chapter ms formed from the Society of Industrial Quality 
Statisticians, which has held meetings at Carnegie Institute of Technology since 
1041, with the object of providing a symposium for those interested in industrial 
applications, The Constitution of the Pittsburgh Chapter was ratified at the 
meeting. The object of the Chapter is to foster the advancement of mathe¬ 
matical statistics and to promote its application to industrial problems 
The following officers (or the Chapter for 1943 were elected: 

PmM F. G, Noams, Wheeling Steel Corp. 
fe President, K. L, Emeus, Carnegie Institute ol Technology 
Sid -Trw., H, J, Hand, National Take Co, 
iSpMsor, E. G, O.'.da Carnegie Institute el Technology 
fieri fa*M,Il, F, Passano, Ml*® Steel Co, 

J, Manim, Westmghouae Eleetric k Mfg. Co, 

Howard Ham, 
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By Henry Scheffe 
Princeton University 

CONTENTS 

Page 


1. Introduction, , , ... . 305 

Pari I Nbn-paramelnc tests 

2 Tho randomization method of obtaining similar regions , , 307 

3. Goodness of fit. Randomness , . ,, , 312 

4. The problem of two samples ., , ... 313 

5. Independence , . , ,, , . , .... 316 

6. Analysis of variance. . ,, , ... 316 

Part II. Non-parametnc estimation 

7. Classical results on point estimation ... . . .320 

8 Confidence intervals for an unknown median, for the difference of medians 321 

0 Confidence limits for an unknown distribution function .... 322 

10 Tolerance limits . .. ., ... 323 

Part III. Toward a general theory 

11 The criterion of consistency, . . 324 

12 Likelihood ratio tests . . .325 

13. Wald’s formulation of the general problem of statistical inference . 327 


1. Introduction. In most of the problems of statistical inference for which 
we possess solutions the distribution function is assumed to depend in a known 
way on certain parameters. The values of the parameters are unknown, and the 
problems are to make inferences about the unknown parameter values. We 
refer to this as the parametric case. Under it falls all the theory based on nor¬ 
mality assumptions. 

Only a very small fraction of the extensive literature of mathematical sta¬ 
tistics is devoted to the non-parametric case, and most of this is of the last 
decade. We may expect this branch to be rapidly explored however: The 
prospects of a theory freed from specific assumptions about the form of the 
population distribution should excite both the theoretician and the practitioner, 
since such a theory might combine elegance of structure with wide applicability. 
The process of development will no doubt inspire some mathematical attacks of 
considerable abstractness. There are already signs that more number-theoretic 
problems and measure-theoretic problems will enter our subject through this 
door, and perhaps even some topological ones. Some ability to think in terms of 

1 Parts of this paper were used in an invited address given under the title "Statistical 
inference when the form of the distribution function is unknown’ ’ before the meeting of the 
Institute of Mathematical Statistics on September 12, 1943 in New Brunswick, N J. 

305 



306 


HENRY 8CHEFFf! 


functionals, function spaces, and metrization of function spaces will be useful in 
attempting general theories of ‘'best” tests and estimates. Toward such ab¬ 
stract phases of the development the attitude of the practical statistician should 
be one of tolerance, for the new theory already promises to give him many new 
tools which are both simpler and of wider use. 

While the maturity of the non-parametric theory is still in the future, it is well 
to remark that its beginnings go relatively far back. Of our most famous tests, 
such as Pearson’s x 2 -test, Student’s test, and Fisher’s analysis of variance tests, 
the oldest concerns a non-parametric problem: In 1900 Karl Pearson proposed 
his x^criterion to test the goodness of fit of a theoretical distribution to observa¬ 
tions, and in 1911 he extended his x 2 -method to the problem of two samples. 
The first of these problems may be regarded as non-parametric if the choice of 
the theoretical distribution is not based on calculations from the data, and the 
second is without doubt a non-parametric problem. R. A. Fisher treated an 
analysis of variance problem non-parametrically at least as early as 1925, for in 
the first edition of his Statistical Methods for Research Workers we find the sign 
test. General formulations of the problems of statistical inference, and criteria 
for “good” and “best” solutions 2 have been advanced by R. A. Fisher, Neyman, 
E. S. Pearson, and Wald. These general theories were all strictly parametric 
until 1941 when Wald proposed one sufficiently broad to cover the nop-parametnc 
case. 

We now introduce some notation to which we shall adhere throughout this 
paper. Statistical inferences are based on measurements. The total number 
of measurements will always be denoted by n. We conceive of n random 
variables X\, X 2 , • • • , X n on which the measurements are made. The domain 
of each Xj can always be taken to be a set of real numbers. If vector random 
variables occur, the X, will denote components, The cumulative distribution 

function (c.d.f.) of the random variables will be written F n {r,i ,%,•■•, x„), - 

this is the probability that all X, < Xj simultaneously The c.d.f. F„ is then 
always defined in a complete. n-dimensional Euclidean space W, called the 
sample space; W is the space of points E = (*i, ®i, , z>). The sample 

point with the random coordinates X\, ■ ■ • , X„ will be denoted by E. 

In describing the validity of specific non-parametric tests and estimates in the 
sequel it will be convenient to refer to the following classification 8 of univariate 
c.d.f a Fix ): Qo is the class of all F, 0 2 is the class of all continuous F. fl a is 
the clasd of all absolutely continuous F, that is, those F for which there exists a 
probability density function f(x), so that 

f(x)= f m df. 

•*—00 

12 * consists of all F which may be written in the above form with / continuous. 


1 For a bibliography see [22]. 
* The notation follows [31]. 
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Part I. Non-parametric Tests 

2. The randomization method of obtaining similar regions. In any problem 
of statistical inference it is assumed that the c.d.f. F n of the measurements is a 
member of a given class £] of n-variate distribution functions, we write F„ e 0. 
SI is called the class of admissible F n . If fl is a fc-p ammeter family of functions 
the problem is called parametric, otherwise, non-parametric. A statistical 
hypothesis H is a statement that F„ e«, where w is a given subclass of 0. A test 
of the hypothesis H consists of choosing a Borel region w in the sample space 
W and rejecting H if and only if the sample point E falls in to; u is called the 
critical region of the test. 

The choice of the critical region w is usually* made as follows: A positive con¬ 
stant a (ordinarily about 01 or .05) is chosen and called the significance level of 
test If regions w exist for which Pr[E t w | F n ]—the probability that the sam¬ 
ple point E fall in w, calculated from the c d.f F»—is equal to a for all F n e u, 
then the choice of critical region is usually limited to this class. Such regions 
are very important in the theory of testing hypotheses, and it is convenient to 
have a name for them: Following the terminology of Neyman [22] in the para¬ 
metric case we shall call them similar to the sample space with respect to all F n 
in o), or more briefly, similar regions. A similai region is then a region w for 
which Pr{E ew\F n \ is the same constant for all F n eu. The advantage of 
using similar regions as critical regions is that the risk of rejecting the hypothesis 
when it is true (type I error) is controlled: no matter what member of o> the 
unknown F„ happens to be, the probability of rejection of the (true) hypothesis 
is exactly a. We remark here that the problem of the existence and structure 
of similar regions m the parametric case has been treated only under very heavy 
restrictions and must be considered still mostly unsolved, whereas we shall see 
later that in the non-parametnc case it promises to be relatively simple. 

When similar regions exist for a chosen a there is usually a large family of 
them. Ideally the choice of the critical region w from the family of similar 
regions would be based on a complete knowledge of two functionals of F n for 
F n 10 — u, that is, for those F„ corresponding to the various admissible ways m 
which the hypothesis can be false, the first, the probability of rejection (of avoid¬ 
ing a type II error), namely Pr[ E tw \ , called the power function of w, and 

the second, the relative importance of rejection in the concrete situation in which 
the test is to be applied. In other words, one would like to choose the w with the 
power function “best” for the very specific problem at hand. However, little 
has been done along this line in the non-parametric case, and, as we shall note 
below, the choice of w from the family of similar regions is usually made by 
means of a statistic chosen on intuitive grounds 

A general method of obtaining similar regions, which we shall call the ran¬ 
domisation method., will now be described The credit for originating this 
method and envisioning its wide applicability belongs to R. A. Fisher, who first 

* Another approach to the choice of critical region will be described in section 13. 
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used it in 1925 [3]. Consider the set S of permutations on the coordinates 
Xx, ®s , which leave invariant all the c.d f’s F n in to. Suppose the 

number of permutations in the set S is s; then s divides n\. Now define for any 
point E in W a corresponding set [E'\ of s points obtained by making on the 
coordinates of E the permutations of the set S. The value of the c.d.f, F n is 
then the same at all s points E 1 generated by E, for all E e W and all F n t & 
The s points of [E'\ will be distinct unless the point E lies in a certain region 
I'Fc ; Wn depends on the set S of peimutations determined by the class co, and will 
always be contained in the union of all diagonal hyper-planes X{ = x, (i ?;]), 
A critical region w is constructed by the randomization method by choosing a 
positive integer q < s, and for every E not in Wo , putting q points of the corre¬ 
sponding set |T'} in w and the remaining s — q points outside w, by any rule 
whatever, just so w is a Borel set. We shall also say that a Borel set w is ob¬ 
tained by the randomization method if it has the structure just described except 
on a (Borel) subset wo of w having the property fY[E e | F n \ — 0 for all 
F n eu It may be shown by the methods used elsewhere [31] by the writer 
that if to is a class of continuous c.d f’s then the region w thus obtained is a 
similar region with a = q/s) furthermore, that under mild restrictions (roughly, 
that the boundary of w be a sufficiently “thin” set), at least for certain classes to, 
this is the only method of obtaining similar regions 

One might call the set [E'\ of points corresponding to E the subpopulation of 
points “equally likely” under the null hypothesis H, but we shall call { E'} simply 
the subpopulation corresponding to E The decision as to winch q of the s points 
of the subpopulation are to be put into the critical legion w is usually made with 
the aid of a statistic T chosen on an intuitive basis. By a statistic T we mean of 
course a function of the sample only, not depending on the c.d.f. F n , thus 
T(E) = T(X i, • , X„). For a suitably chosen q, the q points of the sub- 

population \E'} giving T(E') values in a certain range—usually the q largest or 
q smallest values—are put into w, and these q values are then called the “sig¬ 
nificant” values. 

Before proceeding further let us consider an example illustrating all the defini¬ 
tions we have introduced thus-far. Suppose that on the basis of a sample of m 
pairs (Xi, F,), % = 1, 2, • • • , m, from a bivariate population with unknown 
o.d f G(x, y ) we wish to test the independence of the random variables X, Y, 
To fit our general notation write F, = Z, +m . Assuming only that the sample 
is random, we have, with n = 2m, that the c d f. of the sample point E is of the 
form 


vi 

F n {xt , ■ • • , x n ) = IT G(x,, x i+m ). 

t-1 

Now suppose we know or are willing to assume further that the unknown c.d.f. 
G(x, y) of the population is in a certain class oJ 2) of bivariate c d.f.’s, where 
Q? 1 is the bivariate analogue of the class fi, of univariate e.d.f.’s defined in section 
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1; thus if we knew the unknown G(x, y) were continuous, we would have G e SI®. 
The class 0 of admissible F „ is then 

F n \F n = n G(x t , x l+m ); G e fb (2) y, 

where the notation {F n \ F„ of the form 3) denotes the class of all F n of the 
form The hypothesis of independence may now be expressed as H: F n tu, 
where the subclass « of 0 is 



Ci 1 


F n ] F n = n F a \xf) XI F {2) (x,) ; F ik> e 

1=1 2=771+1 



The set S of permutations which leave all F n ta> invariant is obtained by mak¬ 
ing all possible permutations of the first m coordinates xi, • • • , x m among them¬ 
selves, and of the second m coordinates x m+ i, • • • , x 2m among themselves. The 
total number s of permutations in S is thus (ml) 2 Making these permutations 
on the coordinates of any point E in W, we get the set j E '} of (to!) 2 points. The 
points of \E') are distinct unless E lies in the region W a defined as the union 
of all hyperplanes x, = x, where i ^ j and i, j are both in the set of integers 
1,2, • • • , to or else both in the set m -j- 1, • • • , 2m. Pitman [28] applied the 
randomization method to this problem, using as the statistic T(E) the numer¬ 
ical value of the (sample) Pearsonian correlation coefficient, 


T(E) = 


%t+m 


A m 2m \} 

Ex! E ), 

1=1 2=771 + 1 / 


the large values of T being the significant ones We note that T(E) takes on 
at most m* different values over the subpopulation. What we previously called 
a “suitably chosen” q would be in the present case a multiple of ml, and the 
choice of significance level a = q/s would then be limited to multiples of 1/m! 

The method of randomization is seen to exploit whatever symmetry properties 
the F„ in u possess as a class A special case of the general method is the method 
of ranks This gives regions of an especially simple form defined by certain 
inequalities on the coordinates. Probably the only case m which the method of 
ranks will ever be used is when the F n in ti have the following special kind of 
symmetry: Suppose they are completely symmetrical m each of certain subsets 
of the coordinates, say t sets of ni, n 2 , • - , n L coordinates, respectively, where 
EU«. = n. We may assume the coordinates numbered so that F n is com¬ 
pletely symmetrical in the set x Pt+1 , x Pt+! , ■ ■, x P{+ „ x (p; = E’-i n i> * — 2, 3, 
■ • • , t; pi = 0), for all F n e u. The set S of permutations is thus generated by 
making all n x ! permutations on the n, coordinates x Pi+i , ■ ■ • , x Pl+ni (i = 1, ■ ■ ■ , 
t), so that the total number of permutations in S is s = ni ! rii 1 • ■ n t l. 

Corresponding to the 2 -th set of coordinates in which F r . is symmetrical, let 
us divide the sample space W up into n ,! regions defined by 


Xpi+l X Pl +2 < • • ' < 
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and the 7ii! — 1 other inequalities obtained by permuting the subseripts in the 
above. Denote these regions by w x , k (k = 1, • • , n ,!). Let 

Wki'ki,■ ,k t = n wt,k„ n n w Ul , 

that is, Wk 1 ,k t , is the part of W common to the regions Wi,^ , , • ■ ■ , 

w,,k t . This process divides the sample space IF up into s disjoint regions 
tOkj.jn,■■■,>!, i which we shall now denote simply by w* (a- = 1, • • • , s). The set 
jw,j of regions covers all of the sample space IF except the region W 0 on which 
certain coordinates become equal. We shall say that the sample point E has 
the <r-th ranking, R „, if E falls in w ,. We may then speak of a random variable 
R ~ 12(E), the “ranking”, taking on the s possible values 72 „, or the “tied” 
ranking T2o if E e TFo. 

A critical region w is constructed by the method of ranks by taking w to be 
the union of q of the regions w a . Those rankings 72„ corresponding to the q 
regions w„ constituting the critical region w, will be called the significant rank¬ 
ings. Any statistic T(E) used as the criterion to decide which are the significant 
rankings now becomes a function of the ranking R only, 7'(77) = U(R). We 
may regard the method of lanks as a simplification of the problem of testing 
statistical hypotheses, in which the infinite n-climensional sample space IF is 
replaced by a finite space of s + 1 points 72 „. If 0 is a class of continuous F„ 
we may ignore the point R a since then Pr{R — R 0 \ =0. 

In. the problem of independence, which we have used before to illustrate the 
definitions of this section, the method of ranks was applied by Hotelling and 
Pabst [9], who took as the statistic 1/(72) the numerical value of the Spearman 
coefficient of rank correlation, large values being significant. 

The method of randomization yields similar regions if co is a class of continuous 
functions. What will the method get us if we drop the continuity restriction? 
In this case we can no longer ignore the possibility that the sample point E fall 
in the exceptional legion Wo , for we do not have Pr[E e TFo} = 0. We owe to 
Pitman [27] the following idea: We continue to use the subpopulation [E'} and 
a chosen statistic T(E) as above, but instead of separating the points of {E 1 } 
into two classes (significant points and non-sigmficant points) by means of T(E) 
we now add a third class of “doubtful” points. 6 If the s points of the set {E'\ 
are not distinct they are to be counted according to their multiplicities under the 
process of applying the permutations of the set S to the coordinates of E. Sup¬ 
pose that the large values of T are significant. Number the s points of {7?'} so 
that T(E[) > T(E' t ) > • • > T(JE\). If T(E' q ) > T(E' q+1 ) we 'call 
E[ significant, and the rest non-significant. However if T(E' q ) = T(E' Q+ \), we 
term all points E' with T(E') = T(ii^) doubtful, points E' for which T{E') > 
T(E' t ), significant, and points E' with T(E r ) < T(E > g ), non-significant. This 
prooess divides the sample space IF up into three regions instead of the customary 

4 instead of the terms significant, non-sigmficant, doubtful, Pitman uses discordant, 
concordant, neutral. 
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two, namely, a rejection region w R , an acceptance region vj a , and a doubtful 
region w D . It is a special case of the following procedure: For every set {E'\ 
define positive integers m B = m R ({£')) and rn A = m A ({£"() such that m R < 
q, Wa < s — q, and put m R of the points E 1 in w R , m A of the points E 1 in io A , 
and the remaining s — m A — m R of the points E 1 in w D , in any way so that w R 
and w A are Borel regions. When any E' is assigned to w R or w A it is to be counted 
according to its multiplicity as defined above, if {E'\ contains less than s dis¬ 
tinct points. It may be shown that with a = q/s, Pr{E tw R \ F n ] < a and 
Pr {E « u>a | F n \ < 1 — a for all F n e co, that is, whenever H is true. 

Before closing this section on the method of randomization, we mention a few 
difficulties which frequently arise when it is applied. Except for very small 
samples the calculation determining whether or not the observed value Eo of 
the sample point E belongs to the significant points of the subpopulation {!?[) 
generated by E a , is usually extremely tedious In such cases the author of the 
test often gives an approximation to the discrete distribution of his statistic 
T(E) over the subpopulation {E'\ by means of some familiar continuous dis¬ 
tribution for which tables are available, the laborious exact calculation by 
enumeration then being replaced by the computation of a few moments (that is, 
values of certain homogeneous polynomials in the observed coordinates) and the 
use of existing tables of percentage points of the continuous distribution 8 
Barring some papers where the method of ranks is used, the justification of these 
approximations is never satisfactory from a mathematical point of view, the 
argument being based on a study of the behavior of two, or at most four, mo¬ 
ments. The only exception to the last statement appears to be a very recent 
paper by Wald and Wolfowitz [42], which may point the way to genuine deriva¬ 
tions of asymptotic distributions for the non-rank case of the randomization 
method. We shall distinguish between derivations of asymptotic distributions 
and arguments based on two or four moments by saying that a disthbution is 
“proved" m the former case and “fitted” in the latter. 

Another difficulty arises, most noticeably in the method of ranks, out of the 
possibility of equality of the observed coordinates In the distribution theory 
this is usually avoided by assuming u to be a class of continuous c.d.f’s, so that 
Pr{E e Wo | F„] = 0 for all F„ e oj, but m practice, since the measurements are 
usually made to about three significant figures, ties do occur in the sample. 
While some scattered work has been done on this question there is need for a 
thorough general treatment. 

In some of the work that has been done on particular non-parametric tests 

8 In many cases the approximate test obtained by fitting a familiar distribution is found 
to coincide with widely used tests based on normality assumptions. In such cases if the 
fitting is asymptotically correct the following remarks are justified, (1) If the non-para- 
metric test is used in a case where we hesitate to assume normality but normality actually 
exists, the non-parametric test is asymptotically as efficient as the older test assuming nor¬ 
mality. (2) If normality is assumed when it does not exist, no error is incurred asymp¬ 
totically when the older test is used. 
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it is not vary clear just what the null hypothesis H is. Two situations often 
occur: Suppose H: F n e u is the hypothesis we actually wish to test at significance 
level a. Let w be the chosen critical region and the class of F„ for which 
Pr{E tw | F n \ = a. The two situations are (i) u is a proper subset of <u w , 
and (w) is a proper subset of w. Of these (i) seems less objectionable, for then 
the probability of a type I error (rejecting H when true) is strictly a, but the 
probability of accepting H is the same when certain alternatives (F n eu„ - u) 
are true as when H is true. In case (it) the probability of a type I error is not 
a unless F n is in the subclass of w; thus there might be a much higher prob¬ 
ability than a of rejecting H when it is true, if the true F n e to — u w . To illus¬ 
trate situation (t) consider K Pearson’s x 2 -test. for goodness of fit of a theoreti¬ 
cal distribution F a (x) to a sample E. Suppose E is from a univariate population 
whose true c.d.f. is F(x). If F has the property that for the intervals I ,■ defined 

in section 3, / dF = / dF 0 , j — 1, 2, ■ • ■ , N, then the probability of re- 

jection is the same as when the hypothesis is time. An example of (li) might 
occur if we wish to test whether the means of two univariate populations aie the 
same. If we use one of the tests of section 4 in which the probability of rejection 
is calculated under the assumption that the distributions of the populations are 
the same, then we do not know that the probability of a type I error is a, for the 
mmples might come from two populations with tjhe same mean but different 
distributions. 

3. Goodness of fit. Randomness. The uon-parametric case of testing good¬ 
ness of fit is the following: On the basis of a sample E from a population with 
c.d.f. F(x) known to be a member of some ft,, w r e wish to test whether F = F a , 
where F K is a given c.d.f. The class of admissible c.d.f.’s F n is 

0 = ^F n \F n = IX F(xi)] F e ft,j, 
and the hypothesis H specifies that F n e u, where 

co = |f„|E b = nFo(*,)j. 

K. Pearson’s x 2 -test [25] consists of choosing an integer N, dividing the x-axis 
up into a set (!/} of disjoint intervals (j = 1, 2, • • ■ , N), and using as statistic 
T(E) the Pearsonian chi square, 

Xr = Z [TOj - <S(m,)]76(ttl,), 

I-l 

where is the number of observed coordinates of E in I ,, and &(m,) = 
n / dFa . Large values of xr are regarded as significant. Exact significance 
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levels for xp could be obtained by considering its distribution over the sub¬ 
population { E '} generated by the sample This process would lead to the 
multinomial distribution of the m 3 mentioned in the usual derivations of the 
asymptotic distribution of xp (for n —> « with N fixed). Pearson himself found 
this asymptotic distribution, namely the x 2 -distribution with N — 1 degrees of 
freedom. In studying the problem of a “best” choice of the set {/,■} of intervals, 
Mann and Wald [17] adopted a non-parametric treatment, with v = 2 for the 
class above. 

Another test not depending on a choice of intervals I y could be made by using 
confidence belts for F as described in section 9 and rejecting H at the a level of 
significance if the graph of F a is not covered by the belt with confidence co¬ 
efficient 1 — a. 

The problem of randomness is usually non-parametric; in the univariate case 
the class u of this problem is identical with the class Q of the preceding The 
index v and the class 0 for the problem of randomness would depend on the 
specific situation in which it arises. With two exceptions [42, 52], all tests of 
randomness proposed thus far have been functions of runs in the sample Two 
kinds of runs have been considered, runs up and down, and runs above and below 
the median [1, 4, 14, 19, 32, 44, 51] We note that the set S of permutations 
determined by a> is the set of all n\ permutations on the n coordinates of E. 
Suppose now v = 2. The proof [31] that all similar regions w have the random¬ 
ization structure applies to this problem. On the other hand such a region w 
has the property Pr (E e w | F n ] = a for any F n which is completely symmetrical 
in the coordinates Difficulty (i) discussed at the end of section 2 now arises if 
0 contains such symmetrical alternatives. The definition of an appropriate 
class Q — a) of alternatives and the question of the power of tests against the 
alternatives make the problem of randomness a difficult one Beyond these 
few remarks we refer the reader to an expository paper by Wolfowitz [51] de¬ 
voted to the problem in the previous issue of this journal, and to a paper by 
Wald and Wolfowitz [42] in the present issue. The latter paper is one of the 
exceptions, previously mentioned, not based on the method of ranks. 

4. The problem of two samples. Suppose Xi, ■ • ■, X mi and Y\ , • • • , Y m . are 
samples from univariate populations with c.d.f’s F(x) and G(x) respectively, 
where we assume F, G e 0, , and that we wish to test the hypothesis that F = G, 
Write Yi = Z, +m ,, so that with a = mi + mjwe have 

0 = If* | Fn = ft F(xi) ■ II G(x,); F, G e 0,}, 

ia.1 ]o»mi+l ) 

OJ = ^Fn | F„ = 

The set S of permutations determining the subpopulation [E'\ consists of all 
n\ permutations on the n coordinates of E. The writer has shown [31] that no 


n F{xi))F^, 
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similar regions exist in this case if v = 0, while if v = 2, 3, or 4 a similar region 
necessarily has the randomization structure. 

The first non-parametric attack on this problem was given [26] by K. Pearson. 
The a>axis is divided up into intervals Ii, ■ • • , I# as in section 3. Let m, 
and m,i be the number of measurements from the first and second samples, re¬ 
spectively, falling in I,, so that X?-i m jt= = m*, k = 1,2. The statistic T(E) 
used is 


4 

XP' 


(mirth) 1 X — mritijif/(m,i -f m, 2 ), 


i -1 


with large values significant. In view of the remarks at the end of the last 
paragraph it would be necessary to calculate the distribution of xp’ over the sub¬ 
population { E ') in order to get a similar region. Pearson found the asymptotic 
distribution of Xf under the null hypothesis to be the x 2 -distribution with 
JV — 1 degrees of freedom. 

A solution based on the method of randomization was proposed by Pitman 
[27]; the special case of this solution for m i = mi was published a little earlier 
by R, A. Fisher [6]. Pitman employed the numerical value of the difference of 
the sample means as statistic, 


T(E) = 


mi 

X 


i-l 


71 

— X x i/ m i , 

1—m j+1 


large values being significant. He fitted an incomplete Beta-distribution to the 
subpopulation distribution of his T(E), and noted that this approximation 
gave a result identical with the usual t-test valid when the population distribu¬ 
tions F(x) and G(x) are assumed normal with equal variances. 

Turning now to tests based on the method of ranks, we mention here that one 
for the case mi = mi was given by R. A. Fisher as early as 1925, namely the 
“sign test” or “binomial series test” [3]. We may (and Fisher did) regard this 
as a test of a less restrictive hypothesis, and shall describe it in section 6. Be¬ 
tween 1938 and 1940 several tests employing ranks were proposed for the problem 
of two samples. The earliest of these, by W. R. Thompson [36], was shown to 
be inconsistent (section 11) with respect to certain alternatives F n efi — o> by 
Wald and Wolfowitz [40]. These authors used as statistic U(R) the total num¬ 
ber of runs in a sequence V of n elements constructed as follows: Rank the 
measurements of the combined sample in order of increasing magnitude. Ac¬ 
cording as the j-th measurement in this rank order is from the first or second 
sample, put the j-th element of the sequence V equal to 1 or 2. In this test small 
values of the statistic lJ(R) are regarded as significant. The test is now quite 
practicable (for v = 2) for certain ranges of mi and rrk . For mi and m 2 both 
< 20, tables by Swed and Eisenhart [34] give the 1% and 5% significant values 
of U(R). Wald and Wolfowitz proved that for n —> «> with k = mi/rth fixed, 
the distribution of U(R) is asymptotically normal with mean 2mi/(l + k) and 
variance 4fcmi/(l + k) 3 . Swed and Eisenhart computed that for mi = m t this 
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gives a very satisfactory approximation outside the range of their tables How¬ 
ever, further computation needs to be done on the accuracy of this approximation 
for mi 7^- ms and one of them >20 

Another test based on ranks was advanced by Dixon [2], using as statistic 
U(R) the random variable 

mi+l 

C 2 = X! i( m i + 1) 1 — fti/mzf, 

3-1 

where the integers n, are defined thus: Let Zi < Z 2 < ■ ■ ■ < Z mi denote the 
measurements of the first sample arranged in rank order. Then n j is the number 
of measurements in the second sample falling in the interval (Z,_j, Zj), where 
we define Z a = — «>, Z mi+ i = + «o. Large values of C' 2 are significant. Dixon 
tabulated the 1%, 5%, and 10% significant values of C 2 for m 2 , m 2 — 2, 3, • • , 
10; for larger mi, mi he fitted a x 2 -distnbution. 

A paper by Smirnoff [33, 16] suggests the following as a statistic U(R): Let 
(? mi (a;) and G m% ( x ) be the “empirical distribution functions” of the first and 
second samples, that is, m$ mi (x) is the number of measurements in the i-th 
sample <x (i — 1, 2) and take 7 

U{R) = (mr 1 + ml 1 ) -1 sup | G ny (x) - G ni (x) | 

x 

with large values significant. Smirnoff showed that if v - 2 the asymptotic 
distribution of his statistic U{R) is a certain c.d.f. $(X), previously introduced 
by Kolmogoroff [15], More specifically, let $ milWa (X) = Pr{ U{R) < X; v = 2}. 
Then if n —> « with mi/m 2 fixed, we have 3>m lr m z (X) — > $(X). The definition 
of $(X) and references to tables of $(X) are given in section 9. If instead of 
assuming v = 2 we take v — 0, the risk of type I errors may be controlled to the 
extent that Pr(rejecting H) < a for all F n t u, by employing Smirnoff’s theorem 
stating Pr\U(R) < X; v = 0} < $ ni , m ,(X), where <& mi . m! (X) is defined above. 

A test for the problem of two samples obtained by Wolfowitz by a modifica¬ 
tion of the likelihood ratio procedure will be discussed in section 12. When 
mi = mi the non-parametric analysis of variance tests of the “randomized 
blocks” type described in section 6 might also be used to test the more restricted 
hypothesis considered in this section. 

The non-parametric problem of k samples has been attacked by Welch [46], 
who used the method of randomization with the analysis of variance ratio as 
statistic T(E), and by Wolfowitz [50] with his modified likelihood ratio method. 

In this as in all the other sections where several solutions of the same problem 
of statistical inference are described, the question as to the relative merits of 
the various solutions anses, and m every case the question is as yet mostly or 
entirely unanswered. The only easy conclusion about the tests of this section 
would seem to be that the tests of K. Pearson and Pitman are not consistent with 

7 We use the notations sup and in/ respectively for least upper bound and greatest lower 
bound. 
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respect to certain subclasses of the admissible alternatives, according to the 
definition of section 11. 

E. Independence. The classes Q and 01 defining the problem of independence 
have already been stated in section 2, in which we described Pitman’s test [28] 
based on the randomization method and the use of j r | as statistic T(E), where 
r is the sample value of the Pearsonian correlation coefficient. Pitman fitted 
an incomplete Beta-distribution to the subpopulation distribution of r 5 and found 
the resulting approximation for v ~ 2 equivalent to the usual test employing the 
^distribution and valid for the case of normality. 

In section 2 we also mentioned the test earlier proposed by Hotelling and 
Pabst [9], which is based on the method ranks and employs the statistic V(Ii) = 

| r' |, where r' is the Spearman rank correlation coefficient. They proved that 
for v — 2 the distribution of ?•' is asymptotically normal if F n t w. Pitman’s 
fitting of an incomplete Beta-distribution ‘ applies also to (r'f, and Kendall, 
Kendall, and Smith [12] made numerical calculations indicating that this gives 
a better approximation than the normal distribution. Since r’ is calculated 
from 2d 2 , the sum of the squared rank differences, the latter may equivalently 
be used as the statistic U(R), small and large values of 2 d 2 being now both 
significant. Kendall, Kendall, and Smith [12] found the exact distribution of 
2 d 2 for the number of pairs m < 8. This work was anticipated by Olds' [23], 
who calculated the exact distribution of 2d 5 for m < 7, and by fitting certain 
distributions for m > 7, gave a very useful table of the 1%, 2%, 4%, 10% and 
20% significant values of 2 d 2 for m < 30. It would be desirable to have these 
tables extended by inclusion of the 5% values. 

M. G. Kendall [10] proposed another measure of rank correlation whose sig¬ 
nificant values are easier to calculate than those of 2d 2 , but since the Olds’ tables 
for the latter are available, Kendall’s innovation does not seem to possess much 
practical advantage. W offowitz [50], using his modified likelihood ratio method, 
gave another test for independence and generalized it to the problem of inde¬ 
pendence of k random variables. 

6. Analysis of variance. We suppose that we have n ~ rc measurements 
arranged in a rectangular layout of r rows and c columns The r rows might 
correspond to the blocks and the c columns to the varieties in an agricultural 
experiment. The null hypothesis H is that of "no difference” in the column 
effects. The measurement in the f-th row and j-th column is supposed to be on 
a random variable 8 X,, with c.d.f. P (,;) (x) = Pr[X,, < *}. Let us assume at 
first that all the X xj are independent. The joint c.d.f. of the random variables 
X X j, ■ • ■ , X T] of the j-th column is then 

F ll) (xi, ■■■ ,x T ) - Pr {x 1; <*!,■■■,*„< Xr] = II 

t=l 

8 The double subscript notation is more convenient here than that used in section 2, 
after the class a has been defined the reader will see that the numbers n, u9ed in section 2 to 
describe the Symmetry of the F„ e a are all equal to c, and the X,, of the present section 
coincides with the X Pt+ , of section 2. 
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The symbol F n for the joint c.d.f. of all n random variables now denotes FJ s Xn , 
■ ■ ■ i *ie ; ■ • ■ ; ®ri, * ■ ■, x T0 ). £2 is the class of all F n of the form 

Fn = n F ll \x lU • • ■ , Xri), 

J-l 

where F u) is defined by the preceding equation, and all F (l}) are in a given class 

. The hypothesis H states that the column distributions are all the same, 

F lf> (x i, • • • , x r ) = F w (x !, ■ • • , x r ) (j = 2, 3, • ■ ■ , c), 
without specifying F m . oi is thus the subclass of £2 comprising all F„ of the form 

Fn = n^> ( Xi„ ... ,**). 

The F n in o> may be written 

Fn = n (n F w wj. 

Regarding the factor in braces for fixed i, we see that it is left unchanged by any 
permutation of the c coordinates xn x, c The set S of permutations is 

thus determined, and the subpopulation {E'\ consists of the (c!) 1 ' points obtained 
by permuting among themselves the first set of c coordinates, the second set of 
c coordinates, • ■ - , the r-th set of c coordinates of E — (xn , ■ ■ • , x u ; ■ ■ ; 

%rl , * ' ' , rrc) • 

The above argument leading to the subpopulation \E') of (cl)' points is based 
squarely on the assumed independence of the n random variables X t] . Suppose 
now that the Xi, are not known to be independent, as may happen in agricul¬ 
tural experiments [24], To make the discussion concrete suppose in the r X c 
layout we have been considering, the rows refer to blocks (of plots) and the 
columns to varieties, so that the random variable X tJ is the yield of the j-th 
variety on the i-th block. We owe to R. A. Fisher the method of including 
early in the experiment a random process which leads to the same “equally 
likely” subpopulation of points { E '} obtained before in the case of independence. 
This physical process which he calls “randomization” then permits the construc¬ 
tion of critical regions by the “method of randomization” in the sense we have 
been using the term. 

To explain the experimental process of randomization we shall imagine another 
r X c layout and a random set of mappings of the two layouts onto each other. 
In each block there are c plots and we now assume these numbered from 1 to c, 
the numbering to be held fixed. The second layout refers to the plots; the rows 
again correspond to the blocks, but the columns now correspond to the number 
of the plot in the block, thus the i, ] cell represents the j-th plot in the i-th block. 
Now consider all 1:1 correspondences or mappings between the two layouts so 
that the i-th row always maps onto the z-th row (i = 1, • ■ , r). There are 
s = (c!) r such mappings M k (k — 1, • ■ • , s). Suppose under the mapping M k 
the t, i cell in the block-plot layout maps on the i, jk cell of the block-variety 
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layout, 'where jk = jk (z, i)> and the i, j cell of the latter corresponds to the i, t h 
cell of the former, U — t k ii, j}. The physical randomization process consists of 
choosing the mapping ilf* so that all s mappings have the same probability 1/s 
of being chosen. In other words, the randomized block pattern is selected in 
such a way that all the s possible patterns have equal probabilities of being 
adopted in the experiment. Now let be the yield of the i, t plot if the variety 
assigned to it by the fc-th pattern is planted there, and let 6 {k) (j/u , • ■ ■ , y TC ) = 
Pr jail F™ < i/i'jj be the joint c.d.f. of the Fp\ In calculating the c.d.f. F„ of 
the Xi, associated with the first layout we must take account of the random 
process by which it is mapped onto the second: 

F„(xn, • • • , x„) = Pr{all X., < 

- t P r {all X* = 7& (W) } Pr]Y[ k U,) < ft,} 

fc -1 

= S G , * * 1 , 

*-l 

ft consists of all F n of the above form with G a) in a given class, say ftj n) . The 
hypothesis H of “no difference” of varieties asserts that the yields of the plots 
do not depend on the varieties planted on them, that is, that all G w are the same, 
G w = G a) , without specifying G a \ ai is the subclass of £2 whose members are 
of the form 

a 

Fn = S 5-) ^ (®M*{1,1) j ‘ ' ‘ > *>’,<t(r,c))i 
fc -1 

It is now seen that any permutation in the set S previously considered merely 
rearranges the terms of the above sum, so that F„ remains invariant, and we 
have the same subpopulation { E ') as before. 

It is to be understood henceforth that either the X,y are known to be inde¬ 
pendent or else an experimental randomization has been carried out as described 
above, so that in either case the above set [E'} of (c!) r points is the “equally 
likely” subpopulation. 

The first application in the literature of the randomization method is found in 
R. A. Fisher’s “sign test” or “binomial series test” [3] for the case of randomized 
blocks with two columns (c = 2). Let Zb be the difference X« — X« . The 
statistic used is a function of the ranking only, namely the number of Zb > 0, 
small and large values being significant. For v =* 2 its distribution under the 
null hypothesis is the binomial distribution with the n and p of the usual notation 
equal respectively to r and %. This test may be regarded as the special case when 
c = 2 of Friedman’s rank method for analysis of variance to be described below. 

Fisher later [5] proposed another test for the case c = 2 not based on ranks, 
and employing as statistic T(E) the absolute value of the mean of the D , defined 
above, with large values significant. The exact distribution of this statistic is 
very laborious to calculate unless r is very small, and K. R. Nair [20] pointed 
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out that the use of the numerical value of the median of the D , (or one of the two 
central values when r is even) had the advantage of a very easily calculated dis¬ 
tribution (if v = 2) The latter may be regarded as a modification of the rank 
method, the method of ranks being applied not in the 2r-dimensionsl sample 
space as described in section 2 but m the r-dimensional space of the differences 
D,. Nair also showed that the distributions of the range and of the midpoint 
of the range of the D, are very simple 

From here on we consider the general case c > 2, but when we speak of dis¬ 
tributions they will be understood to be for the case when the null hypothesis is 
true and v = 2. Welch [45] considered using as T(E) the usual analysis of 
variance ratio appropriate to testing for “no difference” of column effects. He 
transformed this to another statistic and calculated two moments of its subpopu¬ 
lation distribution. The first moment always agrees with that obtained under 
“normal theory”, that is for the case X tJ = C, + Z tl , where the C\ are constants 
and the Zi } are independently normally distributed with the same variance and 
zero means, but the second moment depends on the subpopulation [S'). Here 
the exact distribution of the statistic is of course in general much more tedious to 
calculate than in the previous case c = 2; an incomplete Beta-distribution was 
fitted by Welch. Welch anticipated Pitman [29] who obtained the same results 
and got besides the third and fourth moments of Welch’s statistic. 

The method of ranks was applied by Friedman [7] who employed as statistic 
U(R) a quantity formed as follows: Rank each set of row entries X t] (for fixed i) 
in ascending order of magnitude, and let r x j be the rank of Xi,, so that r«, ■ •, 
r, e is a rearrangement of the integers 1, • * •, c. Let f, be the mean rank of the 
j-th column, f, = X)*-i 7 ‘»j/ r > and take for 17(72) 

U-Cr'i, [f, - 6( f,)T, 

I-l 

where C, e is a certain constant, and $(f } ) is calculated under the null hypothesis. 
For Friedman’s choice of C rc , V may be rapidly computed from the equivalent 
formula 

U = -3r(c + 1) + 12 2 r,,^ j [rc(c + 1)]. 

In his paper Friedman included a proof of Wilks’ that U has asymptotically the 
X 2 -distribution with c — 1 degrees of freedom as r —>• oo. Kendall and Smith 
[13] fitted to a transform of U a Fisher ^-distribution with continuity corrections, 
obtaining a better approximation for small r than the ^-distribution. Wallis 
[43] independently proposed the use of = U/[r{c — 1)] as statistic and called 
it the rank correlation ratio Friedman in a later paper [8] on the subject, using 
exact values he had calculated, together with the Iiendall-Smith approximation, 
published tables 9 of the 1% and 5% significant values of 17 for c = 3, 4, 5, 6, 


8 In these tables our U, r, c are denoted respectively by xj , m, n. 
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7, and sufficiently many values of r so that for these c and any r the significant 
values of U are now easily available. 

After the above lengthy discussion for the “randomized blocks” case of analysis 
of variance, it will perhaps suffice merely to mention that the “Latin square” 
case may be similarly attacked from the non-parametric point of view, and this 
has been considered by Welch [45], Pitman [29], and E. S. Pearson [24]. They 
have taken as the statistic the usual analysis of variance ratio, and the work of 
Welch and Pitman in calculating the first two moments of its subpopulation 
distribution is even more tedious than in the “randomized blocks” case. 

Part II. Non-parametric Estimation 

7. Classical results on point estimation. Throughout part II the symbol 
E will always denote a random sample Xi, • - •, X„ from a univariate population 
with c.d.f. F(x), where F is an unknown member of a given class to be stated 
in each case. The c.d.f. of E is thus 


Fn{xi, = XI F{x,). 

v-1 

The problems of estimation can be stated without reference to the class of 
admissible F n ; 0 would be obvious in every case. 

Let 6 = 8(F) be a real number determined by F (a functional of F) for F in a 
certain class of univariate c.d.f’s. Thus 8 might be the mean of the distribution, 
in which case 8 would be defined for all F possessing a first moment. We shall 
not call 8 a parameter in order to avoid confusion with the parametric case 
E. A. Fisher’s criteria of unbiasedness and of consistency for point estimation 
carry over without change from the parametric case. A statistic T(E) is said 
to be an unbiased estimate of 6 if <§(T) = 9. Write E = E„ and T = T n to 
emphasize the sample size n, and assume that the statistic T n ( E„) is defined for 
all n (or all n > some n 0 ). Then we define l' n (E„) to be a,consistent estimate 
of 8 if it converges stochastically to 8, that is, if Pr j | T n — 8 | > h\ —> 0 as n—* 
oo , for every h > 0. 

In the present paragraph it will be convenient to symbolize the class of F for 
which the i-th (absolute) moment exists; we denote it by 0«)(t = 1, 2, • • •)• 
It is known 10 that a sufficient condition for the stochastic convergence of the 
sample mean x to the population mean is that F 1 0 ( n . Hence for all F «Op), 
£ is a consistent estimate of the population mean; furthermore it is unbiased. If 
we apply this result to the random variable Y = X 2 , we find that for all A 6 , 

y^. [Li x] fn is a consistent unbiased estimate of the second moment of F about the 
origin. Similar statements may be made for higher moments. For F e 0® 
one may show further that with Q defined as (x, — xf, the statistics Q/n 
and Q/(n — 1) are consistent estimates of the population variance, and the 
latter is unbiased. 


10 See, for example, J. L, Doob, Annals of Math, iSlat., Vol. 6 (1935), p. 163 
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If there exists a number M such that F(M) = it is called the median of the 
distribution. The median x of a sample of odd size is the central X< when the 
X, are arranged in order of magnitude; for a sample of even size we may take 
x to be the average of the two central values. It may be shown 11 that % is a con¬ 
sistent estimate of M for F in the subclass of for which the probability density 
function/(. t) is continuous at x = M and f(M) 0. 

8. Confidence intervals for an unknown median, for the difference of medians. 

Arrange the sample in rank order and denote the result by Zi <, Z 2 < ■ • • < 
Z n , where Z\, ■ • • , Z n is a rearrangement of X,, • • • , X„ The joint dis¬ 
tribution of the Z x (or any subset of the Z,) is well known [49] if F(x) is restricted 
to , which we now assume From this distribution theory it is easy to show 
that for any positive integer k < %n, the probability that the random interval 
(Z*,, Z„_i +1 ) cover the unknown population median M is 

Pr{Z h < M < Z„_* +1 ] = 1 - 27,(71 — k + 1, k ), 

where 

Up, q ) = jf r'a - ty-'dt j £ - ty-'dt 

is the incomplete Beta-distribution tabulated by K. Pearson The practicability 
of estimating M by means of the above relation in the non-parametric case was 
noted first by W. It. Thompson [35]. It is not difficult to calculate tables giving, 
for various sample sizes n, the maximum k for which Pr{Z k < M < Z„_s T i) > 
.95 or 99 This has been done for 7i = 6 to 81 by K. R. Nair [21], who listed 
the maximum k as well as n — k + 1 and Ij(n — k + I, k), so that the exact 
confidence coefficient is available Nair also gave asymptotic formulas which 
are very accurate for n > 81, 

It is clear how confidence intervals for the difference d = il7 2 — Mi of the 
medians of two univariate populations with c d.f’s known only to be in fh might 
be obtained by combining two probability statements of the above kind: Let 
the desired confidence coefficient be 1 — a, and form confidence intervals of the 
above type for M\ and Mi with confidence coefficient 1 — Ja; write them 
Pr{M x < M, < Mi) > 1 — |a. Then Pr[M 2 - Mi < d < M t - Mi) > 1 - a. 
Solutions like this which are easily obtained by the combining method in many 
problems are in general not very efficient. 

Some work of Pitman’s [27] may be regarded as a solution of the problem of 
estimating the difference of medians (or other quantiles, or means) of two 
populations in a case essentially more restricted than the preceding, but more 
general than the corresponding parametric case in which the distributions are 
assumed to differ only in location. To describe the nature of Pitman’s result, 

11 This follows from the asymptotic distribution of X. See, for instance, [49], and com¬ 
bine section 4 53 with Theorem (A), p 134 
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let us revert to the notation introduced at the beginning of section 4, but add to 
the assumption that F and G are in a known class the restrictive assumption 
that F and G differ only in location, that is, that G(x ) = F(x — d) The problem 
is the interval estimation of the unknown constant d. Define the random vari¬ 
ables Z % = Y, — d. After noting that the viy 4- mi random variables Xi, • ■ ■ , 
X mi , Zi , • ■ ■ , Z m „ are all independently distributed with the same c.d.f F. 
Pitman was able to apply his results for the problem of two samples to show how 
functions d and d of AT , • - , X mi , Y lt ■ • ■ , Y mi could be calculated such that 
Pr[d < d < d\ > 1 — a for v = 0, while for v = 2 the equality holds. After 
fitting an incomplete Beta-distribution Pitman found that the resulting approxi¬ 
mate confidence intervals coincide with the well known ones employing the 
{-distribution and based on the assumption that F and G are normal with the 
same unknown variance. 

9. Confidence limits for an unknown distribution function. Consider in 
an x, y-plane the graph g of the unknown c.d.f., g being the locus of the equation 
y = F(x), and the possibility of covering g with random regions 9?(E) depending 
on the sample E. Wald and Wolfowitz [39] have shown how for given n and a 
it is possible in a large variety of ways to define regions 9t(E) such that J’r[91(E) 
g \, the probability that the random region 3i(E) cover the unknown graph g, 
is 1 — a for all F t . Instead of describing their general method we shall 
limit ourselves to a special case. This is a very neat solution the necessary 
distribution theory for which was developed earlier by Kolmogoroff [15]. 

Let G n {x) be the “empirical distribution function” of the sample: nG n {x) is 
the number of X, < x Define the random variable 

D n - Vn sup | F(x) - G n (x) |, 

X 

and let 4>„(X) be the c.d.f. of D n , <J?„(X) = Pr{D n < X}. Kolmogoroff proved 
that 4 n (X) is independent of F e n 2 , and that as n —* °°, $„(X) -* 4(X) uni¬ 
formly in X, where is defined by the rapidly converging Dirichlet series 

$(X) = £ (-1)* exp (-2fcV). 

k -- 

A small table of values of the function $(X) was given by Kolmogoroff [15], and 
a larger one by Smirnoff [33], Define X„, 0 from 4>„(X„,„) = 1 — a, and X„ from 
$(X„) = 1 — a. Values of X« for a = .05, .02, .01, .005, .002, .001 were listed 
by Kolmogoroff [10], Now 1 — a is the probability that 

Vn sup | F(x) - (?„( x) | < X„, a 

X 

if F « fia - The above inequality is equivalent to 

G n {x) - X„, a /V n < F{x) < G n (x) + X„,«/Vn (all x). 

If we take as 51(E) the intersection of the region between the graphs of the func¬ 
tions (?„(x) ± b,./Vn, with the strip 0 < y < 1, we have Pr{iR(E) q\ — 
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1 ~ The values of \ n , a have not been tabulated, but for practical purposes 
of determining an unknown c.d.f one would usually require a large n, and the 
tabulated values of A„ could then be used. 

With t b„(A) defined as the c d f. of D n for F til 2 , Kolmogoroff has shown further 
that for F e if#, Pr[D n < X) > $„(X). This gives the beautiful result that the 
above confidence belt is valid in the most general case where F t fto, in the sense 
that for the above defined 31(E), Pr [31(E) 3 g] > 1 — a 

10. Tolerance limits. An ingenious formulation and solution of a non-para- 
metric estimation problem was given by Wilks [47]. Let us say that an interval 
(x', x") covers a proportion t of a population with c d.f. F(x) if F(x") — 
F(x') = x. In the notation of section 8, Wilks considered the proportion B cov¬ 
ered by the interval (Z k , Z n - m + 1 ) extending from the ft-th smallest observation 
to the m-th largest, B = F(Z n ~ m+ i) — F(Z k ). B is a random variable depending 
on the sample but is not a statistic since it depends also on the unknown c.d.f. 
F(x). However, Wilks noted that the c.d.f G(b) of B is independent of F e , 
in fact, for 0 < b < 1, 

G(b) ~ h(n — k — m-f-1, k + m), 

where h(j), q) is defined in section 8 After k, m, a fixed proportion 5, and a 
confidence coefficient 1 — a have been chosen, the equation G(b) = a determines 
the sample size n for which we can then make the following assertion without 
any knowledge of F except that F t fi 4 : The probability is 1 — a that in a sample 
size n the random interval (Z k , Z„_ m+I ) will cover at least 100 b% of the popu¬ 
lation. 12 

Wilks considered, among other extensions of his method, tolerance limits for 
multivariate distributions in which the variables are known to be independent, 
and the estimation of proportions in a second sample (instead of in the popula¬ 
tion) on the basis of a first sample [48], The latter problem involves the calcu¬ 
lation of P(b; n, N, k, m), the probability that if a first sample of n is taken and 
then a second sample of N, a proportion b or more of the second sample will lie 
in the interval (Z k , Z n - m + 1 ) determined from the first sample. Wilka’ deriva¬ 
tion of P requires the assumption that F tilt, but a simple auxiliary argument 
(related to the method of randomization by ranks) will extend the validity to 
the case F til 2 : The complete set of n + N variates is independently distributed, 
each with the same c.d.f. F eih . All (n + N) 1 possible rankings (excluding the 
“tied” ranking Ro) as defined in section 2 then have the same probability 
l/(n + N )! The fraction of these rankings for which the statement about pro¬ 
portions in the second sample is coirect is a function of b, n, N, k, m only, and 
not of F 1 0 2 , and this fraction is the desired P. Since P is the same for all 
F e Hi it must of course coincide with the value calculated by Wilks for F e . 
It would be desirable for practical purposes to extend the validity of the tolerance 

15 For fixed b, G(b) of course takes on discrete values with n, so one would either choose 
the n giving G(b) the nearest value to a or else the greatest value < a. 



324 


HENRY SCHEPFii 


limits of the first paragraph, concerning proportions in the population, at least 
to the case F «SL The extension to O 2 would follow immediately if the in¬ 
tuitively reasonable statement 1 — G(b) = linay-,., P(b; n, N, k, m) could be 
justified for F «ft . 

The multivariate case when independence is not assumed was successfully 
attacked by Wald [38], We shall describe here his solution for the bivariate 
case: Let (X<, Yi), 1 = 1, • • , n, be a sample from a population with bivariate 
c.d.f. Fix, y) < that is, F is of the form 

Fix, y) = f [ f(£, v) dij d£, 

where f(x, y) is continuous, but otherwise unknown. Plot the points (X,, 7 ; ) 
in an x, i/-plane and choose four (small) integers ki , mi, h , m 2 . , Draw vertical 
lines (parallel to the y- axis) passing through the points with the fci-th smallest 
and mi-th largest abscissas. Considering only the n — fci —points inside 
these vertical lines (the probability of equal abscissas is zero), draw two hori¬ 
zontal lines passing through the points with fe-th smallest and m 2 -th largest 
ordinates. Let J be the rectangle bounded by the four lines and consider the 

proportion B of the population covered by the rectangle, B = / dF(x,y). Then 

Jj 

the c.d.f. 6(b) of B is given by the previous formula in terms of the incomplete 
Beta-distribution with k -f m = ki + k 2 + mi -f- mt, and is thus independent 
of f(x, y). Choose ki, h, mi, m 2 , b, and a. Then the equation Gib) = a de¬ 
termines the sample size n for which the probability is 1 — a that the random 
rectangle J will cover at least 100 b% of the population. Wald showed further 
how a series of rectangles instead of a single rectangle might advantageously be 
used in the case of highly correlated X, Y. 

It would be most useful to have tables of n corresponding to a = .05 and .01, 
some values of 6 close to unity, and a few small values of k 4- m, say, k + m = 
2, 4, • • • , 2r. The table could then be used for the Univariate, bivariate, • • • , 
r-variate cases with various choices of k,, m,, such that 2(fc,- + m } ) = k + m. 
Entries for k + m = 4 have been given by Wald [38, p. 55]. 

Part III. Toward a General Theory 

11. The criterion of consistency. All the concepts of Part III have been 
carried over from, or suggested by, corresponding ones earlier developed for the 
parametric theory. Consistency of point estimation was defined in section 7. 
Wald and Wolfowitz [40] have generalized the notion of consistency to tests so 
that it is applicable in the non-parametric case. We have heretofore specified 
the hypothesis H and its admissible alternatives by means of classes of n-variate 
c.d.f’s F n . We now assume that H and its admissible alternatives can be 
framed as statements about one or more populations, independent of n. Thus 
in the problem of two samples (section 4) H may be taken as the statement that 
the c.d.f’s F and G of the two populations are the same member of 0,, while the 
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admissible alternatives are statements that F and G are any two different mem¬ 
bers of . Returning to the general case, we assume that a sequence of tests is 
under consideration, say, £ 1 , , ■ • , such that as j —> » , the size of the sample 

in from each of the populations becomes infinite. The sequence | T j) may 
be called simply a “test” and is said to be consistent if the probability of rejec¬ 
tion of H by X, approaches unity as j —> » whenever an admissible alternative 
to H is true It has been suggested [50] that consistency is a minimal require¬ 
ment for a good test. In order to allow for the analogue of the “common best 
critical regions” in the parametric theory, 13 it would be better to define consist¬ 
ency with respect to any given subset of the admissible alternatives and then 
require consistency with respect to the subset appropriate to the specific situa¬ 
tion in which the test is to be used. 

Wald and Wolfowitz [40] proved that under certain restrictions on the ad¬ 
missible F, G in the problem of two samples their test based on runs (section 4) 
is consistent, while another previously proposed test is not. Judging from 
their work, we may expect that, while inconsistency proofs may be easy, con¬ 
sistency proofs will be difficult. 


12. Likelihood ratio tests. A definition of the Neyman-Pearson likelihood 
ratio criterion 14 X for testing the hypothesis H (we use the notation of section 2), 
which would yield the usual result in the parametric case, would be the follow¬ 
ing: Let C (E ;5) be a cube of edge 2S in the sample space W with center at the 
point E and faces parallel to the coordinate hyperplanes, and let P(E,S | F n ) be 
the “probability put into the cube by the c d f. F n ”, that is, P(E,S \ F„) = 


I 


c(e;i) 


dF n . 


Define 


ME, 5) = [ sup P(E, S | F n )\/[ sup P(E; S | F n )], 

F n toJ JI 

X — X(£') = bra X(2£j 5). 

8-*0 


This definition of X is not useful in the non-parametric case as X turns out in 
general to be independent of E, the reader may easily verify this for the problem 
of two samples (section 4) 

Having seen now that the likelihood ratio does not carry over to the non-paia- 
metnc case in an obvious way, we are in a position to appreciate a bold stroke 
by Wolfowitz [50]. He begins by limiting the critical regions to be considered 
to the relatively small class obtainable by the method of ranks (section 2) Let 
R = R( E) be the ranking of the sample point E, so that the random variable R 
takes on the possible values Ii 0 , R \, ■ ■ • , , and let P(R<, \F n ) = Pr{R = R a \F r .}. 


13 J Neyman and E S Pearson, “On the problem of the most efficient tests of statistical 
hypotheses”, Phil. Trans. Roy. Soc London, A, Vol. 231 (1933i, pp 280-337. 

11 J. Neyman and. E. S Pearson, Biometnka, Vol 20A (1928), p 264 
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Then Wolfowitz takes the likelihood ratio to be the following function of the 
ranking ft: 

A(ft) = [ sup P(ft I F„)]/[sup P(R I F n )\. 

C.n r„etj 

His modified likelihood ratio test then consists of applying the method of ranks 
(section 2) with A (R) as the statistic, small values being regarded as significant. 
If fi is a class of continuous F n , all rankings R 7 ^ ft 0 have the same probability 
1 /s under the null hypothesis, while P(Ro | F n ) = 0 for all F n e fi, Then the 
numerator of A(ft) is 1/s, and we may thus use the denominator of A (ft) as 
statistic with large values significant. Wolfowitz’ modification has one ad¬ 
vantage we don’t always find with the usual parametric method: it always leads 
to similar regions since it is a special case of the randomization method. 

In applying his method to examples Wolfowitz finds it necessary to resort each 
time to an approximation in calculating his statistic A(ft). Instead of taking 
the “sup” over 0 as in the definition, he takes it instead over a subclass 0 ' of 0 
which lends itself more easily to calculation Thus in the problem of two samples 
with v — 2, whereas 0 is the class defined in section 4 with F, G in £ 1 2 , the class 
SI' is the subclass of 0 obtained by further limiting F, G as follows: The x-axis is 
divided up into a number of disjoint intervals, equal to the total number of 
runs in the sequence V defined m connection with the Wald-Wolfowitz test in 
section 4. If the j-th run in V is a run of 1 ’s the restriction G(x) = 0 in the 
7 -th interval is imposed, if the 7 -th run is a run of 2’s, F(x) — 0 in the 7 -th inter¬ 
val. The intervals in which F, G are permitted to assign positive probability 
then correspond in order and number to the two kinds of runs. With this re¬ 
striction the (twice) modified likelihood ratio statistic is found to be 

X X (k log l, - log h, 0, 

' 3 

where Uj is the number of elements in the 7 -th run of i’s (i = 1,2). Large values 
are significant. For large samples the asymptotic distribution of the statistic 
falls out as a special case of a general theorem of Wolfowitz. 

In the same paper Wolfowitz obtained modified likelihood ratio tests for the 
problem of k samples and the problem of independence of two or more random 
variables. 

In his examples the author states that the maximizing F n in Q' is “essentially 
the same” as the maximizing F n in 0, at least for the significant rankings ft, 
and for large samples. The necessity of this approximation procedure is some¬ 
what disturbing, as is the restriction to the method of ranks. Since it does 
not seem possible to give a definition of likelihood ratio tests sufficiently broad 
to include the non-parametric case, yet yielding the usual result in the parametric 
case, we are denied even the small comfort of saying that at least in special cases 
the method is known to yield optimum results. In some problems the set 
{ft,} of rankings, corresponding to the set {w,} of regions in W which serves to 
separate the s points of the subpopulations { E '} defined in section 2 , is not 
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unique—consider for instance the problem of two samples when the populations 
are bivariate—and in such cases the method as defined above would not give a 
unique result. These remarks are intended to point the need for further in¬ 
vestigation and cannot detract from the ingenuity of the method—the first 
general process that has been suggested for choosing one out of the welter of 
similar regions yielded by the randomization method. 

13. Wald’s formulation of the general problem of statistical inference. A 
formulation of the general problem of statistical inference broad enough to cover 
the non-parametric case, and including estimation and tests as well as statistical 
problems classifiable under neither of these headings, has been given by Wald 
[37]. This formulation extends certain concepts he had applied earlier 16 to the 
parametric case. 

In this last section we shall permit ourselves a somewhat more abstract ter¬ 
minology and notation than before. As m section 2, E = (Xi, ■ • ■ , X n ) will 
denote the sample; F n (E ), its c.d.f.; W, the n-dimensional Euclidean space of E, 
the sample space; and fl, the space of admissible F n . Of central importance is 
a given class © appropriate to the problem, © = joip), whose members up are 
(not necessarily disjoint) subsets of 0, 0 = (Jw • To every up e © there corre¬ 
sponds a hypothesis H (up) ‘ F n t up, so that there is a 1:1 correspondence between 
the members of the set © and those of the set {H(up)} of hypotheses. The 
general problem of statistical inference, according to Wald, is the choice of a 
decision function A (E) mapping W into ©. For every E t W a decision function 
A (E) umquely selects an element up of ©, up = A (E). Its statistical import is 
that when the sample point E equals E t we agree to accept the hypothesis H (up) 
determined by A(E) = up . 

Before introducing any further definitions let us illustrate the preceding ones. 
In any problem of testing a hypothesis, the set © has just two members ui and 
cos which we have heretofore denoted by u and fl — u, respectively. The de¬ 
cision function A (E) then takes on just these two values, in fact, A (E) = 
for E in the critical region w of the test, and A (E) = on for E t W — w. 

To illustrate the definitions in the case of point estimation, consider estimating 
the median M of a univariate population with c.d.f. F(x). 0 would be the class 
of F n of the form H,"-i F(x,) with, say, F t and F'(M) ^ 0 (which is sufficient 
to insure a unique M). The index 8 could now be identified with M, so that its 
domain is the real line, and up = {F n | M(F) = fi). The classes up would be 
disjoint in this case and each would contain an infinite number of F n . The 
problem of estimating the unknown M may be said to be the choice of a decision 
function A (E) : When E = E we accept H (up) : F n t up = A (E), meaning in this 
case simply that we accept the statement that M equals the /3 determined by 

HE). 


ls A. Wald, “Contributions to the theory of statistical estimation and testing hypoth¬ 
eses”, Annals of Math Stat., Vol. 10 (1939), pp. 299-320 
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Suppose next that instead of the point estimation of M just discussed we are 
interested in the interval estimation of M We define Q as above, and now take 
the index (i to consist of a pair a, b of real numbers. An interval estimate a < 
M < b may be regarded as an acceptance of the hypothesis H(u aib ):F n e w 0ii 
where is the subclass of 0 consisting of all F n for which M(F) lies m the inter¬ 
val a < M < b. The set © now consists of all classes co a ,i, with — » < a < 

6 < + «j . Here as in the general qp.se of interval estimation the classes «p of 
the set © are not disjoint. The decision function A (E) adopted in section 8 is 
A (E) = oj a ,i with a = Zt ., b = s n -k+i , where Zi < zi < ■■ ■ < s n is a rearrange¬ 
ment of the coordinates x %, ■ • ■ , x n of E. 

An example of a problem neither of estimation nor testing would be the fol¬ 
lowing: Let 0 be as above. Two real numbers A and B (A < B) are given and 
it is required to decide on the basis of the sample E to which of the three classes 
— oo < M < A, A < M < B, B < M < + <x> the unknown mediah. M belongs. 
Here the set © would consist of jjhree disjoint classes a>i, co 2 , : where on is 
the subclass of Q, consisting of F n with M(F) < A, etc. 

We return now to the general case. Before defining a “best” decision func¬ 
tion A = A*, Wald asks that there be a given weight function W(/A , wp) defined 
on the product space ft X ©. The weight function )v(F n , cop) is a real-valued 
function evaluating the loss involved in accepting AT(«p), the statement that the 
unknown c.d.f. of E is a member of cop, when the unknown c.d.f. is actually F n . 
If F n t cop we make no error in accepting F(cop), and in this case to is defined to 
be zero. Its value otherwise is required to be non-negative. In this theory the 
choice of the weight function is regarded as essentially not a mathematical prob¬ 
lem, but the choice is to stem out of the very specific situation in which the 
statistical inference is to be made. In an industrial problem to might be the 
financial loss incurred when a certain kind of error is made, 

After Id is given, the decision functions A are to be restricted to the class for 
which to(F n , is a Borel-measurable function of E for all F n e ft; note that 
to depends on E only through A, not through F„ The expected value of to 
for a particular F n is called the risk function; it depends of course on the decision 
function A and the weight function to as well as on F n . Denote it by 

r(A, to | F n ) - f to(F„ , A (E)) dF n (E). 

j w 

Since the true F„ is unknown, so in general will be the true value of the risk 
function associated with a particular decision function A. We might call 

r(A, to) = sup r(A, to | F n ) 

F„.D 

the maximum risk associated with the decision function A. Wald defines A* 
to be the “best" decision function relative to the weight function to if the maxi- 
'mum risk r(A, to) is minimum for A = A*. He points out that the “best” decision 
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function might be defined as one which minimizes some weighted mean, taken 
over all F„ t 0, of the risk function r(A, hi | F n ), but that the above definition of 
the 1 ‘best” decision function has certain advantages Thus under certain restric¬ 
tions on ft and tt), the risk function r(A* m | F n ) is independent of F n e ft, that is, 
we then know the exact value of the risk, regardless of what the true F r may be. 
This is analogous to the desirable situations where confidence intervals are 
known, and the probability of a false statement (to the effect that the unknown 
quantity is in a given region when it is not) is then a constant independent of 
the unknown quantity. 

Wald’s theory is suggestive and formally very satisfying, but one would like 
to see some specific examples of its application to non-parametric cases. A 
discouraging aspect, not shared by the older Neyman-Pearson theory, lies m the 
very refinement that a decision function is declared best with respect to a very 
particular weight function to. An attractive possibility would be to impose a 
metiic on ft or on a related function space, and to let to be the distance function. 
In the problem of two samples for example, after metrizing ft„, the weight to 
assigned to accepting II might be taken as the distance between F and G m the 
notation of section 4. A suitable choice of metric might yield a weight function 
appropriate to a large variety of situations. The difficulties of finding a distance 
function which is intuitively satisfactory and analytically tractable m calculat¬ 
ing the risk function are no doubt formidable. The device of metrizing a space 
of distribution functions was used by Mann and Wald in a different connection 
[17], but their choice of distance function, while appropriate to their problem, 
would not be satisfactory heie 

Also still lacking is any general theory relating the three concepts discussed in 
Part III. The following questions have been answered, at least for some specific 
examples, m the parametric case, but are still untouched in the non-parametric 
case: Are likelihood ratio tests consistent ? Is there a simple weight function to 
relative to which the likelihood ratio test becomes a “best” test, or asymptoti¬ 
cally a “best” test? If a test is “best” relative to a given weight function, with 
respect to what set of alternatives is it consistent? 

In conclusion let us emphasize the need for constructive methods of obtaining 
“good” and “best” tests and estimates in the non-parametric case. Recalling 
the history of the parametric case we may judge that half the battle was the 
definition of “good” and “best” statistical inference. Progress in the non- 
parametric case has been made m the direction of definition, mainly by carrying 
over or modifying criteria originally advanced for the parametric case. How¬ 
ever, besides criteria for “good” and “best” tests and estimates, we have in the 
parametric case a large body of constructive theory which may be applied in 
particular examples to yield the optimum tests or estimates; thus we have the 
Fisher theory of maximum likelihood statistics for point estimation, and the con¬ 
structive theorems of the Neyman-Pearson theory for the existence of critical 
regions of types A, Ai , B, Bi , and the related types of “best” confidence inter- 
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vals. The contrasting lack of any general constructive methods 18 at present 
challenges us in the non-parametric theory. 
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ON THE THEORY OF SAMPLING- FROM FINITE POPULATIONS 

By Morhis H. Hansen and William N. Hurwitz 
Bureau of the Census 

I—HISTORICAL BASIS FOR MODERN SAMPLING THEORY 

The theory for independent random sampling of elements from a population 
where the unit of sampling and the unit of analysis coincide was developed by 
Bernoulli more than 200 years ago. The theory that would measure the gains 
to be had from introducing stratification into sampling was indicated by Poisson 
a century later. Subsequently, Lexis systematized previous work and provided 
the theoretical basis for sampling clusters of elements. 1 The adaptation of the 
work of Bernoulli and Poisson to sampling from finite populations was sum¬ 
marized by Bowley in 1926 [1] approximately a century after the work of Poisson. 

An impetus to sampling advancement, following some fundamental statistical 
contributions of Pearson, Fisher, and others, resulted from the work of Neyman 
when he published his paper in 1934 on the two different aspects of the repre¬ 
sentative method [8]. In that paper he introduced new criteria of the optimum 
use of resources in sampling, including the concept of optimum allocation of 
sampling units to different strata subject to the restriction that the sample have 
a fixed total number of sampling units. 

If, no matter how a sample be drawn, the cost were dependent entirely on the 
number of elements included in the sample, there would be little need for theory 
beyond the classical theories of Bernoulli and Poisson covering the independent 
random sampling of elements within strata, supplemented by the extension of 
the theory to finite populations, and the extension to optimum allocation of 
sampling units. Very often, however, in statistical investigations it is extremely 
costly, if not impossible, to carry out a plan of independent random sampling 
of elements in a population. Such sampling, in practice, requires that a listing 
identifying all the elements of the population be available, and frequently this 
listing does not exist or is too expensive to get. Even if such a listing is avail¬ 
able, the enumeration costs may be excessive if the sample is too widespread. 
Frequently also, there are other restrictions on the sample design, such as the 
requirement that enumerators work under the close supervision of a limited 
number of supervisors, and as a consequence the field operations must be confined 
to a limited number of administrative centers. Techniques such as cluster 
sampling [2, 3, 4, 5, 6, 7, 8, 10], subsampling, and double sampling [9], have been 

1 The sampling of clusters of elements refers to the sampling of units that contain more 
than one element Examples of cluster sampling include the use of the city block or the 
county as the sampling unit when the purpose of the survey is to determine the properties 
of the population made up of individual persons or individual households In these in¬ 
stances, the city block or county is referred to as the cluster of elements, and the individual 
person or household is referred to as the element 
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developed with the aim of making most effective use of available resources, while 
keeping within existing administrative restrictions, and thus producing the maxi¬ 
mum amount of information possible within these resources and restrictions. 
Neyman [8], Yates and Zacopanay [10], Cochran [2], Mahalanobis [7], and others 
have made important contributions in this regard. 

We can illustrate a number of the developments indicated above in a simple 
but fairly general subsampling design. This design involves the sampling of 
clusters of elements from a stratified population and the subsampling of elements 
from each of the selected clusters, where the number of elements in each of the 
primary sampling units within a stratum is the same. 

Suppose we have a population made up of L strata, with the z-th stratum con¬ 
taining M% primary sampling units of N. elements each The individual element 
will be the subsampling unit. Let X tl k be the value of some characteristic of the 
fc-th element of the j-th primary sampling unit in the f-th stratum, and assume 
that the character to be estimated is 

h Mi N , L 

(1) 2 = X,*/E MiNi. 

* I k v v 

For example, if X is the average income per household in a given city, X t) h might 
be the income of the fc-th household in the j-th city block in the i-fh ward; 
where the household is the subsampling unit, the city block is the primary 
sampling unit, and the stratification has been by wards. Suppose, further, that 
we sample rrn primary units from the i-th stratum, and subsample elements 
from each of the primary units sampled from that stratum. 

The "best linear unbiased estimate” [8] of X from the sample will be 

h 71 x “at mi n t L 

(2) X' = E ——' £ E Xi lk /E M, IV,, 

» Tit ] k t 

and the variance of X' is 

(3) <rV = E Ml Nl { Mi - m, £ &1 - X,f 

l M % — 1 MiTYli 

U% Nx 

VfI E E (Xi, k - Xi,) 2 

« , IViW, i k _ 

N t — 1 

Ni _ Mi Ni 

where X„ = E X„ k JN, and X< = 2 E X<*/M, N t . 

» i k 

These formulas have no practical utility in designing samples unless there are, 
in addition, some considerations of differential costs. Cost relationships some¬ 
times may be stated explicitly as a function of the m, and the n<, or, what is 
frequently the case, they may be approximated sufficiently through intuition 
and speculation to guide one to a reasonable decision among the various alter¬ 
natives implied by the design. 
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If we know the cost function we proceed to determine the values of the m, and 
the 7i{ that make a a minimum for a fixed total expenditure, and also subject 
to any other restrictions that may be imposed. This theory provides a basis for 
determining the optimum allocation of the sampling ratios to the various strata, 
and to primary and secondary sampling units within each stratum. 

Such developments, however, must be regarded as only the first step in sample 
design. We cannot go forward if we only know that the optimum sample design 
is some particular mathematical function of the population parameters and the 
cost factors; we need also to know something about the relative magnitudes of 
certain parameters m the particular populations under consideration, as well as 
something about the costs associated with the various sampling and estimating 
operations. 

Thus, considerable work in recent years has been done on the study of the 
relative magnitudes of variances and covariances between and within various 
types of sampling units and on the study of costs and types of cost functions 
that operate. Work is being done in this field by the Department of Agriculture 
in connection with sampling for agricultural items, and is being done also in the 
Bureau of the Census, and in other places. 

II—THE DIRECTION OF MORE RECENT DEVELOPMENTS 

The sampling procedure indicated above involves as a first step the definition 
of the system of sampling, such as whether the sampling method will involve 
cluster sampling, double sampling, or subsampling, and along with this the 
definition of the stratification and the sampling units. The second step is that 
of determining the method of estimation , together with the allocation of the sam¬ 
pling units. 

The first step, that of defining the sampling system is taken with a view to 
administrative feasibility and sampling efficiency, but no simple procedure exists 
which leads one uniquely to the selection of a system except perhaps by the 
impractical method of listing and examining all possible alternatives and accept¬ 
ing one on some criterion of best. However, giyen the definition of a population 
character to be estimated, and a sampling system, a simple procedure is available 
that will provide a unique solution to the second step providing we accept some 
criterion as to what “best” means, such as the best linear unbiased estimate, 
subject to any cost or administrative restrictions that may be imposed. Such 
criteria lead us to both our estimating procedure and our allocation of sampling 
within the sampling system defined. 

While no theory with practical applicability has been developed which indi¬ 
cates a “best” system of sampling, and at the same time indicates the “best” 
estimating procedure and sampling allocation, some progress in the choice of 
improved sampling systems and estimating procedures has been made. The 
developments in the following two directions appear to us to be particularly 
pertinent. 

1. Modifications in some of the fairly generally accepted criteria of good 
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sample estimates have led to more reliable sample results for some types of 
sampling systems (some of these are mentioned in Sec. Ill); 

2. Some principles are emerging, that have led to improved determination of 
the sampling units, the strata, and other aspects of the sampling system 
(some efforts at formulating such principles are reported in Secs. IV, V and 
VI). 

We shall summarize, principally, some of the recent work in the Census—and 
in so doing shall mention some work of others that is closely related. Most of 
the work that we shall summarize relates to problems where the sampling units 
are clusters of elements and vary in size 


III—MODIFICATIONS IN THE CRITERIA FOR GOOD ESTIMATES 

The estimate given m the general subsamphng problem formulated in Sec. I 
satisfies the criterion of the “best linear unbiased estimate.” Also, as far as our 
experience lias indicated, this estimate is frequently the most efficient one for 
populations of the form described, that is, where the number of elements in each 
sampling unit within a stratum is the same. However, if the numbers of ele¬ 
ments differ between sampling units, a biased but consistent estimate can fre¬ 
quently be found that has a substantially smaller mean square enor 2 than the 
best linear unbiased estimate. 

For example, consider the case where clusters of elements are the sampling units 

M M 

and we want to estimate X - Xi/jL, N ,, the average value per element 


of some specified characteristic. Here M is the number of sampling units in the 
population, X , is the aggregate value of the specified character for all elements 
in the i-th cluster, and Ni is the number of elements in that cluster. The joint 

Af 

distribution of and N x is unknown, but J^N{~ N is known. Under these 


circumstances the “best linear unbiased estimate” of X from a sample of m 
M -rh 

clusters turns out to be — Xt/N. However, a smaller mean square error is 

17b 1 

often obtained by the use of a ratio estimate from the' sample such as 

m m 

y', Xi/^2 A,. This estimate is excluded by the “best linear unbiased” cri¬ 


terion because it is nonlinear and biased, although the bias is usually negligible 
and the estimate is consistent. Since the best linear unbiased estimate of X 
requires the knowledge of N, the sample ratio has a further advantage in that 
it can be used even when N is net known. 

A recent paper by Cochran [3] gives a number of consistent though biased esti- 


1 In this paper the terms “mean square error” and “variance” are used interchangeably 
to refer to E(X — X ') 1 when EX is equal to X, the population character to be estimated. 
When EX is not equal to X, however, E(X — X ) 3 will be referred to only as the "mean square 
error ” Since, under these latter circumstances, E(X - X) ! ' = E(X - EX ) 1 + (EX - X) 3 , 
the mean square error is equal to the variance of X plus the contribution due to the bias. 
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mates of X that make use of the least square estimate of the linear regression of 
X, on N t . These estimates geneially have a smaller mean square error than 
either the best unbiased linear estimate or the simple ratio estimate given above. 
However, they require knowledge of N, as does the best linear unbiased estimate, 
and in addition may require detailed tabulations and considerable clerical work 
as a part of the estimating process. 

Both types of biased estimates mentioned above are consistent, and usually 
have a smaller mean square error than the best linear unbiased estimate for 
sampling systems in which the sampling units vary in size Thus, improved 
sample estimates will be obtained by modifying the "best linear unbiased 
estimate” criterion to include estimates that are nonlinear, consistent, but have 
a smaller mean square error than the best linear unbiased estimate. 

IV—IMPROVEMENTS IN THE SPECIFICATIONS OF 
SAMPLING SYSTEMS 

A great deal can be done to improve sampling designs through improved speci¬ 
fication of the sampling system even though one has only a limited knowledge of 
the manner in which the population is likely to be made up, and no specific 
information concerning the particular population parameters involved (see 
Sec. VI). 

1. The sizes of sampling units. A number of recent investigations have 
indicated the desirability, with costs considered, of keeping the size of cluster 
veiy small when clusters of elements are used as the sampling unit in field sur¬ 
veys [2, 5, 6, 7, 8]. It is important to point out, however, that this principle is 
not necessarily applicable to subsampling systems, and that the use of large 
clusters as the primary sampling units in a system involving subsampling may 
yield distinct gains over the use of smaller clusters without subsampling More¬ 
over, one of the often recurring problems in large-scale studies is the designing of 
sample surveys within stringent administrative restrictions on the number of 
different locations in which operations can be carried on Under such restric¬ 
tions a procedure commonly used is to choose a limited number of existing 
political units, such as counties, as the primary sampling units, and then to sub¬ 
sample units such as blocks, small rural areas, or households. Under the circum¬ 
stances, if the numbers of primary subsampling units to be included in the 
sample are assumed to be held constant, the use of larger primary sampling units 
than the existing political units would have the effect of decreasing the sampling 
variance. 

The advantage of using large primary units in subsampling is evident in the 
simple case when the original units, each having the same number of elements, 
are consolidated to form half as many enlarged primary units, each twice as large 
as the original units. The variance between the enlarged primary units will be 
trlt = |cri 6 (l + p), where a\b is the variance between the original primary units, 
and p is the correlation between the units that are paired. The correlation coeffi- 
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cient will be close to zero (exactly equal to — 1/ {M — 1}, where M is the number 
of original primary units) if the pairing is done at random, and it follows that the 
variance between counties is then cut at least in half. Ordinarily, p will be 
greater than zero if the paired units are required to be contiguous. However, 
through choosing for consolidation those contiguous units that are as different 
as possible, p is made as small as possible, and in some instances this minimum 
value may even be negative. In any event, the smaller the value that p takes on, 
the greater the reduction of the sampling variance between primary units from 
the use of enlarged units. While the sampling variance within primary units is 
increased by such consolidations, the increase is slight, and the total sampling 
variance is almost invariably decreased (see Appendix, Section 1) 

The restriction on extending the consolidation of primary units is introduced by 
the increased cost of subsampling within larger and larger areas This increased 
cost is to be weighed against the decreased variance If the cost restriction 
were not sufficiently severe, consolidation would proceed to the point of eliminat¬ 
ing the use of primary sampling units altogether, and the subsampling units 
would be selected independently throughout the entire stratum. 

2. Subsampling where the primary units are of unequal size. Use of proba¬ 
bility proportionate to size in subsampling. A subsampling system frequently 
followed, whether or not the primary sampling units vary in size, involves the 
selection of one or more primary units fiom each stratum with the probability 
of selection the same for each primary unit in the stratum, and the subsampling 
of a fixed proportion of the subsampling units from the selected primary unit. 
When the primary units vary in size this subsampling system has some ad¬ 
ministrative disadvantages that arise because the number of subsampling units 
to be included in the sample will vary with the number of elements m the se¬ 
lected primary unit. (The term “size” of sampling unit as used in this paper 
refers to the number of elements in the sampling unit.) 

The disadvantages in the above system have led in some instances to the speci¬ 
fication of a second subsamphng system in which, although the primary units 
were selected with equal probability, the subsamphng has been of a constant 
number rather than of a constant proportion. 

A third subsampling system that can be recommended over both the above 
systems is to make the probability of selection of a primary unit proportionate 
to its size and then to subsample a constant number of subsamphng units. 

We shall assume that for all three systems only one primary unit is selected 
from each stratum. Stratification to- this degree leads to a smaller sampling 
variance than does less extensive stratification. For simplicity in making com¬ 
parisons, we shall assume, furthermore, that the subsampling unit is the element 
of analysis and that the sample estimate used is of the form X' = ZNfrX h fZNk 
where 5* is the sample average, for the fi-th stratum, of the character being 
estimated, and Nk is the size of that stratum. This estimate, which is frequently 
used, is biased for the first two systems but unbiased for the recommended sys- 
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tem. However, an unbiased estimate, say the “best” linear unbiased estimate 
for the first two systems generally has a much larger mean square error than the 
biased estimates used m these comparisons and hence has not been considered in 
the comparisons which follow (see Sec VII, footnote 7). 

The first two subsamplmg systems mentioned are about equally efficient when 
the number of subsamplmg units drawn from each primary unit is reasonably 
large, but each will usually have a larger mean square error than will the recom¬ 
mended system. The difference between the mean square errois of eithei of the 
first two and that of the recommended design is given approximately by 

(4) ^ £ QkN^I |X p^Nh - £ 

where, within the h -th stratum, Ah, is the number of elements in the j-th primary 
sampling unit, N h is the average size of primary sampling unit, Q h is the number 
of primary sampling units, ph, is the intra-class correlation between elements 
within the j-th unit and o\ is the variance between individual elements within 
the stratum; L is the number of strata (See Section 2 of the Appendix foi the 
development of this difference.) 

This difference, which is a multiple of the average covariance between the 
Nh, and p,,j , will be positive if Nh, and are negatively correlated, and this is 
exactly the situation that exists in most practical problems we have encountered 
m sampling for social and economic statistics (see Sec. VI), 

The reduction in the mean square error arises because the recommended de¬ 
sign provides a more nearly optimum allocation of sampling as between large 
and small sampling units than do the other two. It might be possible, of course, 
as another alternative, to stratify the primary units by size and then allocate 
sampling to the various strata on the basis of optimum sampling considerations. 
However, this would mean that some other and perhaps more important modes 
of stratification would be sacrificed, and moreover, the optimum allocation of 
sampling between the larger and smaller units could only be guessed at in most 
practical problems Furthermore, it usually is not possible to stratify on size 
to the point that there is no variation m the sizes of units within a stratum. 

The sample estimate from the recommended system is unbiased whereas the 
estimates from the other two are usually biased, and sometimes fairly seriously 
so. (For a proof of this statement see Appendix, Section 1, and see also Sec. 
VII for a numerical illustration.) 

The use of probability proportionate to size serves to decrease only the sam¬ 
pling variation between primary units and has very little effect on the sam¬ 
pling variance within. Therefore, the recommended design shows its greatest 
advantage over the two alternatives when the contribution of the mean square 
error between primary units to the total mean square error is large 

Ordinarily, the actual sizes oi the primary sampling units will not he known, 
but numbers may be known that are highly correlated with the sizes. For 
example, ordinarily we will not know the populations of blocks or of cities or 
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counties at the time a sample is taken, but we may know their populations at the 
preceding census. Under these circumstances the primary units may be sampled 
with probabilities proportionate to the previously known (or their estimated) 
sizes, but if this is done the subsamphng is to be modified m order to take account 
of the changes in the sizes between the two dates. If the actual sizes aie known, 
the constant number taken from the selected primary unit in the /i-t,h stratum is 
ip, = kNi, where th is the sampling ratio assigned to the stratum, and N h is the 
total number of elements in the stratum. The subsampling ratio within the 
selected primary unit, therefore, is thNh/Nhj , where is the number of ele¬ 
ments in the selected unit On the other hand, if there is available only a meas¬ 
ure of size Pki highly correlated with the actual sizes of the units Nhj and, if the 
probability of selection ot the primary unit has been proportionate to the P h , 
the subsampling ratio m the selected primary unit will be equal to t>P h /P hj , 
where Pi, is the measure of size of the entire stratum, and Ph, is the measure of 
size of the selected primary unit. The variance of a sample estimate where 
measures of size are used is given subsequently in this paper (see Eq. (9)) 

3. The use of area substratification within primary strata in a subsampling 
system. Another modification, which will be called area substratification 
within primary strata, may be particularly useful where a relatively small sample 
is required from a population covering a large area, and where operations must 
be confined to a limited number of centers. 

Some preliminary remarks are necessary before area substrafilication can be 
explained. Area substratification requires (a) that the entire population to be 
sampled be divided into areas that will serve as primary sampling units, (b) that 
these units be further subdivided into a number of sub-areas; and (c) that certain 
summary statistical information be available for each of the sub-areas in advance 
of drawing the sample. The information that must be known for the sub-areas 
includes a reasonably good measure of their sizes (perhaps the total population, 
total dwelling units, or total farms) and other information which is indicative of 
the characteristics of the area, such as whether predominantly farm or'nonfarm, 
predominantly white or colored, etc. The sub-areas, when grouped into homo¬ 
geneous classes, will serve only to determine the substrata described subse¬ 
quently, and will not ordinarily serve as the subsampling units, which may be 
defined independent of the sub-areas. 

The definition of the primary sampling units and the classification of them 
into stiata proceed as indicated earlier, with the primary units made as internally 
heterogeneous as possible within strata that are as homogeneous as possible. It 
will be assumed that only one primary unit is sampled from each stratum, and 
that the probability of selecting the j-th primary unit within the h -th stratum is 
proportionate to Ph ,, where Ph] is the measure of size of the primary unit and is 
equal to the sum of the measures of size of the sub-areas that it contains. It will 
be assumed, also, that ti ,, the over-all sampling ratio to be used within the h -th 
stratum, has been determined for all strata on the basis of considerations of 
optimum allocation. 
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The introduction of area substratification within primary strata may then be 
accomplished as follows: 

(a) The sub-areas withm each primary stratum are classified mto substrata 
on the basis of their characteristics. (For example, they may be classified 
into predominantly farm and predominantly nonfarm sub-areas, and 
these further classified pn the basis of the average size of farm or average 
rental value of the dwelling units. In such a case, the sub-areas within 
the primary stratum that are predominantly farm and that have average 
rental values lying within a specified interval constitute a substratum.) 

(b) The sub-areas within the primary unit selected from each primary stratum 
are classified into the same substrata. 

(c) Subsamphng umts are defined within each of the substrata within the 
selected primary units. The number of subsampling units defined within 
that part of the z-th substratum that is contained within the j-th primary 
unit is denoted by . (Various types of subsamphng units may be 
defined, such as the individual person, farm, dwelling unit, or structure, a 
very small area, etc. The subsampling units need be defined only within 
the selected primary sampling units ) 

(d) The number of subsampling units to be included in the sample from the 
z-th substratum within the selected (j-th) primary sampling unit is 

(fi) ^ MhtjthPhi/Phi} t 

where P,„, is the measure of size of that part of the z-th substratum that 

lies within the j-th primary unit, and Pm = is the sum of the 

1 

measures of size of the sub-areas contained in the z-th substratum of the 
/i-tk primary stratum. This method of allocating the subsampling pro¬ 
vides that the subsample drawn from the selected primary unit is repre¬ 
sentative, so far as possible, of the entire stratum, rather than of the par¬ 
ticular primary unit that happens to be included in the sample from that 
stratum To illustrate, suppose the numbers of persons in sub-areas from 
the 1940 census are used as the measures of their sizes, and that the sub- 
areas are classified into substrata on the basis of their characteristics in 
1940 as indicated by the 1940 Decenmal Census of Population The 
allocation of the subsampling indicated above then provides that if the 
proportion of the total population residing in sub-areas that are pre¬ 
dominantly farm is 30 percent, the sample will be diawn m such a manner 
that 30 percent of the 1940 population expected in the sample would be 
from the predominantly farm sub-areas, even though, in the selected 
primary sampling unit, perhaps only 15 percent of tlie 1940 population 
might reside in such areas 

(e) The population chaiacter to be estimated is 
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where X h ,,i is the aggregate value of a specified characteristic for all of the 
elements contained within the k -th subsampling unit in the i-th substra¬ 
tum of the jf-th primary unit; S h is the number of substrata and Qh is the 
number of primary units in the h -th primary stratum; and L is the number 
of primary strata. (X might be the total number of workers m the 
United States, or the total number of farm laborers, etc.) An estimate of 
X from the sample is 


(7) 


l Sj, m m 

X' = £ i/k z Z Xh.,k . 

h 1 


No summation over j is involved, because only one primary unit is drawn 
from the h -th stratum. This is a very simple estimate, involving a sum 
weighted only at the primary strata level. If the t h are all set equal to 
t, i.e , if a constant proportion is sampled from each stratum, the estimate 
becomes merely the total number of elements in the sample having the 
specified characteristic multiplied by l/t, the reciprocal of the sampling 
ratio. 

The allocation of the subsampling indicated above may be deviated from and 
the controls of area substratification can still be maintained if proper modifica¬ 
tions are made in the sample estimate. In this event, differential weighting 
must be introduced at the substrata level rather than only for the primary strata. 

The definition of heterogeneous primary sampling units, the proper classifica¬ 
tion of them into strata, and the use of probabilities proportionate to the meas¬ 
ures of size in the selection of the primary units are particularly desirable if area 
substratification is used. If these are not introduced the likelihood of making 
substantial gains through the use of area substratification is decreased. The 
definition of the primary strata should be made in conjunction with the definition 
of the substrata, and should insure that each primary unit has adequate repre¬ 
sentation of each substratum that is to be defined within, that primary stratum. 
With this restriction observed, the number of significant substrata that can be 
defined will be limited by the heterogeneity of the primary units Thus, in 
order to provide for substratification into predominantly farm and predomi¬ 
nantly nonfarm areas, the primary sampling units should be defined so that both 
farm and nonfarm areas are represented in each unit. This procedure not only 
makes area substratification more effective, but improves the efficiency of the 
sample in making separate estimates for such classes of the population. How¬ 
ever, if this procedure cannot be adhered to exactly in practice, primary units in 
which certain of the substrata are not represented will occasionally come into the 
sample. One alternative when this occurs is to combine certain substrata; 
another is to exclude such primary units from the sample. 

Since the number of primary strata is restricted by the number of primary 
units to bo sampled, it is wasteful to set up strata at the primary level with re¬ 
spect to sources of variation that can be controlled • adequately through area 
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substratification. For example, if farm areas and nonfaim areas are to be dis¬ 
tinguished in the substrata, the primary strata should not be exhausted by classi¬ 
fying the primary units into a large number of strata by percent farm (percent 
of total population in primary unit living on farms), since the effect of the sub¬ 
stratification is to control the variation in the percentage farm. Limiting the 
number of percentage farm classes at the primary level makes possible the use 
of other modes of stratification that will control on farm type, or on the indus¬ 
trial character of the nonfarm population, or on some other similar criteria 

Area substratification is to be distinguished from the fairly commonly used 
method of specifying the number of elements to come Into the sample from each 
of several diffeient classes of elements—whether such quotas are fixed to make the 
sample correspond with the specified characteristics of the entire primary stra¬ 
tum or of the selected primary sampling unit The method of fixing quotas and 
instructing interviewers or enumerators to obtain a given number of elements 
(persons, dwelling units, farms, voters, etc.) having various specified charac¬ 
teristics has a fundamental weakness that is avoided in area substratification 
within primary strata Such quotas ordinarily must be set on the basis of pre¬ 
vious information or rough estimates, and thus cannot accurately reveal chang¬ 
ing characteristics of the population Area substratification, on the other hand, 
uses previous information to insure the proper representation of various types of 
areas in the sample. The numbers of elements obtained with various specified 
characteristics are determined from the population as it is, and not as it was at 
some previous date. In times of rapid change the fixing of quotas on the basis of 
previous information may introduce increasingly serious biases. 

The gain from using previously available information in stratifying areas 
arises from the fact that there is a high correlation in the characteristic of an 
area from time to time over a period of several years. An area that is pre¬ 
dominantly farm at one date ordinarily will be predominantly farm a few years 
later. Similarly, while very substantial shifts in population may occur, the num¬ 
bers of persons m a set of areas at one time ordinarily will be very highly corre¬ 
lated with the numbers a few years later However, area substvatification does 
not depend on the fact that no shifts occur If shifts have occurred it will 
measure them. If the shifts have been sufficient to completely alter the charac¬ 
ter of most small areas, it will still provide estimates revealing the changing 
character of the population, but under these circumstances the efficiency of the 
method is decreased. 

V—EXPECTED VALUES AND VARIANCES FOR THE SUBSAMPLING 

SYSTEM INCORPORATING THE PRINCIPLES OUTLINED ABOVE 

The system of sampling incorporating the principles of enlarged primary 
units, the selection of primary units with probabilities proportionate to the 
measures of size and area substratification will be examined more fully below. 
It will be referred to, for convenience, as the specified subsamplmg system. 
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1. The expected value of an estimated total for the specified subsampling 
system. All summations in the formulas below are over the population unless 
otherwise indicated, The expected value of X' as defined in Eq, (7) is 

= (1 /t h )(P h] /P h )(m h JM h , s )X h , ik . 

h j x k 

From (5) h = vih,jPhij/M h ,jPhi , and therefore 
EX' = EEEE {P h ,/P hil )iPk,/PK)X hi]k 

h ] * k 

= £ (,P^/PK){P h] /PK)(X h JP hx ,) = £ P h R hw 

h 7 * 

where 

Ph — 22 P'i. = 22 P&.H Rh(.A) = 22 (Ph]/Ph)Rhj{A)‘i 

* 7 2 

Rhj(A) 2 1 (.Phi/Pli)Rh\, , and Rh,, 2 1 Xh,jh/Ph,j X.h,,/Ph,j • 

* 1 

The RhiU) will be refened to as the adjusted ratio for the j-th primary unit. 
It is the weighted average within the j-lh unit of the substrata ratios, Rh,,, 
where the same set of weights Phi is applied to the Ruj m each primary unit 
within a stratum. The Rhu) is the average, within the fi-tli stratum of the 
adjusted ratios. Hence 

(8) EX' = X + £ P»(/4m) - Rh), 

h 

where 

R h = Xh/Ph , with X h = 22 22 X hij , 

» 1 


is the ratio of the aggregate value of the specified characteristic for the elements 
in the h-th stratum to the measure of size of that stratum, and where the popula¬ 
tion character being estimated (6), is equal to X — 2X k = 2 PhRh ■ 

From (8), it is seen that X' is a biased estimate of X, although ordinarily, in 
practice, only slightly so. The bias, equal to 2P h (Ri,(A) — Rh), is the sum of the 
biases for the various primary strata. Under many practical circumstances 
some of these will be slightly negative and some slightly positive, with the result 
that the total bias will be relatively small. The bias would be nonexistent if 
area substratification were not used, or if the form of the sample estimate were 
properly modified, but here again, as in the case of substituting 'biased for un¬ 
biased estimates discussed in Sec. Ill, the introduction of a slight bias may result 
in a substantial reduction in the variance. 

A sufficient, although not necessary, condition for the sample estimate (7) 
with area substratification to be unbiased is for the ratios Ph,,/Phj to be un¬ 
correlated with the Rh,, within each substratum. Under these circumstances 
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and therefore 


«* = Rh,j = E E - ~ s = jRaca) 

* A A / 


P* Pa 


To illustrate, if the measures of size are the 1940 populations, then the sample 
estimate will be unbiased if the proportions of the 1940 populations of the pri¬ 
mary sampling units that are in the various substrata are uncorrelated with the 
corresponding R kx ,. As indicated earlier these conditions are approximated in 
many practical problems, especially if the primary stratification has been carried 
out effectively. Moreover, if the conditions are not met approximately, the 
bias introduced may still be very small. (See Sec. VII for a numerical illus¬ 
tration.) 


2. The mean square error of an estimated total for the specified subsampling 
system. For the development of the mean square error of X' for the specified 
subsampling system, see the Appendix, Section 2 There it is shown that the 
mean square error of X' is 


1 

<Tx‘ 


(9) 




Phi Mhif Vlh\j 

Ph Mh%] — 1 VlhtjThj 


+ 2^*1] T5T (Rhj(A.) — RkU)T + [E Ph(Rk{A) ~ Rh)f 

h i Jr h 


where 


i 

Vh.'j 


E (X Ht]k - Xus?/Mu, 

k 




is the variance between subsampling units within a substratum of the aggregate 
value of a specified characteristic for the subsampling unit and 

~~ Phtj/Mhtj 

is the average measure of size of the subsampling units in the h-i-j-th area. 

The first term of (9) is the contribution of the variance between subsampling 
units and may be kept small by proper definition of the subsampling units, and, 
of course, by increasing the subsampling ratio. The second term of (9) is the 
contribution of the variance between primary sampling units within strata; 
and the third term is the contribution of the bias, which, as indicated before, 
ordinarily will be of negligible size, so that the mean square error and the vari¬ 
ance will be approximately equal. 

It is the variance between primary sampling units that contributes most 
heavily to the total variance in many subsampling situations, and it is on this 
contribution that the modifications proposed m this paper have their principal 
effect. The effect of area substratification is seen by comparing the variance 
between primary units given above with that obtained if area substratification 
were not used but other aspects of the design remained unchanged. In this 
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event the variance between primary units involves the variance of the ratio 

14, = E Abu ,/Pa, = XhJPk, , instead of the variance of the adjusted ratio, 

» 

Phj(A) ‘ 

The relationship between the variance of R hl and that of R hl(A) within the 
/i-th primary stratum is given by 

(10) <h< hj aR kl(A) "t” a ’ R h)- lt H{A) "t" ^P^^ksiA) <r Kl)-»fu(Al ) 

where trK h; _ Kl)(A) is the variance of the difference between the adjusted and the 
unadjusted ratios, and p is the correlation between the adjusted ratio and the 
amount of the adjustment. Thus, if the correlation is near zero or positive, 
there will be a gain from the introduction of area substratification, although there 
may be a loss if the correlation is highly negative. Essentially, the condition 
for p being equal to or near zero is the same as that for the sample estimate being 
unbiased, namely, that the Pa;,/Pa, be unconelated or only slightly correlated 
with the Rh „ within each substratum. 3 

The variance of 2 ?a,(a) rather than that of !?>,,■ occurs in the variance of X' 
because the subsampling numbers were allocated proportionate to the Pa,, 
no matter what primary sampling unit happened to be selected for inclusion in 
the sample. The ratio R h , like Rh,u) may be regarded as the weighted average 
of the 14, , but with the weights equal to Pa,, instead of Pa, , and thus varying 
from primary unit to primary unit. It would appear, therefore, from the rela¬ 
tionship of the variances given above, that if the substrata are effective, and if 
the Pa,, are highly correlated with the actual sizes of the substrata, the weighted 
average using fixed weights in all primary units should have a considerably 
smaller variance than that using variable weights. This turns out to be the 
case in many practical situations, some illustrations of which will be given later 
(see Sec. VII). 

3. The mean square error of ratio estimates for the specified subsampling 
system. The need for estimating a ratio from a sample arises m two cases, 
first, when the ratio is the population character for which an estimate is desired, 
and second, when the application of a ratio from the sample to a known total 
uses additional available information for obtaining an improved estimate of the 
desired total. 

Ratio estimates are desired as an end-result when, for example, the change in 
a characteristic from one time to another is being considered. Thus, if V' is 
the estimated total income of farm workers at one date, and X 1 the corresponding 
estimated total income at a second date, then r 1 = X'/Y'-is an estimate of the 
relative change in the total income of farm workers over the period of time 
covered. Similarly, the estimate of a percentage such as the percentage of the 

3 Actually, a sufficient, although not necessary, condition for p to be equal to zero is that 
Pki:/Pki be uncorrelated with both the ratio R*,, and the cross-product R*,, Ra,, for all 
pairs of substrata 
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workers unemployed will involve the ratio of two random variables from the 
sample Ratio estimates from a sample may be particularly useful in instances 
where the reliability of the ratio estimate is greater than the reliability of the 
estimate of either the numerator or the denominator, as is frequently the case. 

Ratio estimates may be used as a means of obtaining an estimated aggregate 
value of a specified characteristic, if F, the aggregate value of a second charac¬ 
teristic highly correlated with X is known exactly from independent sources, and 
X' and Y', estimates of X and F respectively, are available from the sample. 
Thus 


(ID 


X" = [X'/F']F = r'Y 


is an estimate of the aggregate value of the specified characteristic. If the corre¬ 
lation, in successive samples, between X' and Y' is sufficiently high, the ratio 
estimate will be a more efficient estimate of X than will X', the simple estimated 
total given earlier (7), but X' will prove the more reliable estimate when the 
correlation is low. 4 Thus, X", when the correlation between X' and Y' is suffi¬ 
ciently high, makes use of more of the relevant available information for esti¬ 
mating X than does X'. 

The application of ratio estimates to the specified subsamphng system is 
considered below. 

(a) The estimated ratio and Us mean square error. The estimate of the popula¬ 
tion ratio r — X/Y is: 


( 12 ) 



L I Sh 1 mhi} 

IpESI X h „ k 

h Wi % 2 k _ 

L Sh 1 m-hti 1 

Ej-SS 2 W 

li *h t j h 


where X' is given in (7) above, and Y' is a similar estimate of the total value of 
a second characteristic. The mean square error of r' is approximately 


1 vVVd* Ph, Mhi, — nihij 


y I Mjfcfrtojifc T/ttj) 


F 2 


h X 3 


Ph M>,tj — 1 


r>2 


171 hi j Phi 3 


+ ZEEPiy^ 


(13) 


ft. i 3 


,2 Ph t Mhii — rnjuj 2 (Thy r) 2 

' T h Mk., - 1 ahirY rrihiiPl 


+ 22 Ph 22 y(Uj(A) — fhU)) 2 

h 3 

+ 22 PhifhtA) — 22 Cr (fihjuy.r — &(i) f) 2 

ft i *h 


i The vauanco of the ratio of random variables of the form r' = X'/Y' is approximately 
a 2 , = r 2 (Fx' + Vy, — 2px'r'F.v'Fy') where V indicates the coefficient of variation of the 
variable designated by the subscript, and px'y' is the correlation Hence, if px'y' 13 suffi¬ 
ciently large V 2 r > will be less than Vx< ■ The size of p x 'y' required depends on the relative 
magnitudes of the coefficients of variation of X' and Y'. 
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where 


Xhiik = the aggregate value of a specified characteristic for the elements in 
the fc-th subsanipling unit within the h-i-j-th area, for which a total 
is to be estimated; 

F*,,*, = the aggregate value of a second specified characteristic for the ele¬ 
ments in the same subsampling unit, and for which the total in 
the population is known; 
n,j = Z Fa„* , and Fa = Z Z Fa,-, . 

k i j 

Z (Fa,/A - Fa ,,) 2 

ah ,. y — —-- is the variance of the sampling units in the h-i-j-th 

h '’ area with respect to the second characteristic, 

and Fa,,- — T k\, f M a,, . 

Rh ,u).y = Zw b^- 1 is the adjusted average of the Fa„ , and 
i h i hi, 


Thijk 


-&h i] k 

Fa7a’ 


„ _Xu, 

y , 

r a,/ 


etc., are the ratios of the X to the F for the 
areas indicated by the subscripts, and 


fh, (A) 


Rh,(,A) 

RhjU)\r ’ 


and fA(A) = ~ --- are the ratios of the adjusted 
AU) r ratios for X and F indicated by 
the subscripts; 


and the remaining symbols are as defined in the sections above where the ex¬ 
pected value and variance of X’ are given. 

The first and third terms of (13) are, ordinarily, the principal contributing 
terms The second and fourth terms contain contributions due to the variation 
between the means of the substrata and the primary stiata respectively even 
though the sample was stratified with respect to these classes. In some in¬ 
stances, the contributions of these terms will be important. The between 
strata contributions arise because the primary and subsampling units vary in 
size with respect to the character F. 

This formula for the mean square error of a ratio is approximately equal to the 
one more commonly used given in footnote 4. The two formulas, both of which 
are approximations, would be identical if certain terms which are ordinarily 
negligible were retained in (13). This latter formula has the advantage of indi¬ 
cating the effect of different aspects of the design of the Bample on the variance 
of the ratio. The derivation of this approximate variance formula is given in 
the Appendix, Section 3, together with an indication of the accuracy of the 
approximation. 

(i>) The estimated totals and their mean square errors. As mentioned earlier, 
two estimates of X, the aggregate value of a given characteristic for all ele¬ 
ments are X r (7), and X" (11) The mean square error of X' is given by (9) 
and that of X” is simply equal to F 2 o>» , where ah is given approximately by (13). 
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The decision as to whether to use X' or X" as an estimate of X depends, of 
course, in the first instance, on whether Y is known, and in the second instance, 
on the relative magnitudes of the respective mean square errors given in (9) 
and (13). These may be approximated from prior knowledge concerning the 
relationships in the population under investigation, or they may be estimated 
from preliminary sample investigations. However, m instances where there is 
a positive correlation between the X hljk and the Yh,,k within substrata, it is fairly 
safe to assume that if the information necessary for the ratio estimate is avail¬ 
able, there will be little to lose and possibly considerable to gain from its use. 

The use of (11) instead of (7) is often desirable when 7 m (11) is the aggregate 
value of the actual sizes of the primary units, and Y' is its estimate. This is 
particularly so if the measures of size used are not fairly precise measures of the 
actual sizes, and if, at the same time, the actual size is highly correlated with 
the character being estimated, in which case the use of ratio estimates will yield 
gams in both the between primary unit contribution and the within primary unit 
variance. (See Sec. VII for numerical illustrations ) However, if the measures 
of size are identical with the actual sizes (i.e., Phijk = Yh,,k) the last two terms of 
(13) are identical with the between primary unit contribution to the variance of 
X' (9), and only the within primary unit variance is affected by the ratio estimate. 

While it is fairly safe in practice, if F is known, to make use of X" instead of 
X' as the estimate of X, some care must be exercised to make sure that the 
X h , jk has at least a moderately high average correlation with the Yh,,k, where 
the correlations considered are those within substrata within primary sampling 
uni ts. If this correlation is low, and if the size of the subsampling unit varies 
considerably, the ratio estimate may be considerably less efficient than the simple 
total estimate. On the other hand, if the measures of size of the various sub¬ 
strata and of the primary sampling units are fairly close measures of the actual 
size, and if the subsampling units have been carefully defined so that they do 
not vary too greatly in size, the two estimates are likely to have about the same 
efficiency. 

VI—SOME PHYSICAL PROPERTIES OF FREQUENTLY OCCURRING 
POPULATIONS .THAT ARE BASIC TO THE SAMPLING 
PRINCIPLES RECOMMENDED IN THIS PAPER 

Many actual populations are characterized by the following physical proper¬ 
ties: 

(i) The elements within a cluster are positively correlated with regard to a 
specified characteristic. 

(ii) Clusters containing large numbers of elements have greater internal hetero¬ 
geneity than clusters containing small numbers of elements. 

(iii) Increasing the size of the cluster brings in correlated elements (e.g., in popu¬ 
lation or agriculture surveys larger clusters are formed by including house¬ 
holds or farms in adjacent areas) 
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The first of these pioperties is recognized implicitly in the literature where the 
losses of efficiency through the use of large clusters as sampling units are fre¬ 
quently cited. In our experience the second and third properties hold just as 
commonly in actual populations, and ordinarily for the same populations for 
which the first property holds. 

The presence of these physical properties in combination within strata leads 
to the following mathematical relationships that have been used throughout 
this paper: 

(a) The sizes of the primary sampling units, Nhj , are negatively correlated 
with the ph ,, the intra-class correlations between elements within the 
units; 

(b) The Nhj and Nh,ph, are positively correlated, 

(c) The Ni,, and <rf u are positively correlated; 

(d) The Nkj and olj/Nhj are negatively correlated. 

The use of these relationships has determined most of the choices among 
alternative procedures throughout this paper. The relationships, of course, do 
not necessarily hold, and exceptions to them can be found [5], The frequent 
occurrence of populations characterized by such properties justifies further re¬ 
seat ch on the more effective use of these and other properties that may be found 
to hold. 

VII—SOME APPLICATIONS OF THE PRINCIPLES DESCRIBED 
IN THIS PAPER TO AN ACTUAL SAMPLING PROBLEM 

The analyses summarized below were carried out for the purpose of deciding 
between altei native sampling procedures in the revision of a monthly national 
sample for labor force and other characteristics Budgetary and administrative 
restrictions made it necessary to confine the field operations to a limited number 
of administrative centers scattered over the country, from which a sample of 
less than one-tenth <pf one percent of the population of the United States was 
to be drawn, 

The original sample (the one to be revised) was of a usual subsamplmg design 
in which counties were used as the primary sampling units, and households or 
small clusters of households were used as the subsamplmg units. In the revised 
sample contiguous counties were combined wherever administratively feasible, 
to form more heterogeneous primary units than the individual counties. Ap¬ 
proximately 2000 primary sampling units were formed from the 3000 counties in 
the United States. The combinations of counties, the primary stratification, 
the area substratification, and the measures of size, were determined on the basis 
of 1940 Decennial Census data together with more recent data where available. 6 
The applications of the various principles suggested in this paper have been 

6 See [11] for a full description of the proposed revised sample, including an outline of the 
criteria of stratification used. That paper may be useful as a simple description of an 
application of the specified subsampling system. 
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evaluated, by estimating 1930 Census labor force characteristics from a sample 
that was stratified on the basis of 1940 and more recent data. This constituted 
a particularly severe test of some of the methods, because of the substantial 
shifts that had taken place during the 10-year interval between 1930 and 1940 

The analyses to be summarized in this section are concerned primarily with 
the gains obtainable under favorable circumstances by the introduction of three 
sampling principles, namely, 

(1) enlarged primary units (see See. IV-1); 

(2) the sampling of primary units with probability proportionate to measures 
of their size (see Sec. IV-2); 

(3) area substratification (see Sec. IV-3). 

Some comparisons are also given to illustrate the effect of using alternative 
sample estimating formulas. Computations have been made for six of the prin¬ 
cipal items that are currently being included in a monthly report of the labor 
force; namely, total numbers of male and female workers, total numbers of male 
and female agricultural workers, and total numbers of male and female non- 
agricultural workers The comparisons between alternative systems have been 
made holding constant both the primary stratification criteria and the expected 
numbers of persons to be drawn into the sample. 

The percentage gams given below are the reductions m the between primary 
unit contributions (which include the bias contributions) to the mean square 
error. 6 Except where otherwise specified, the sample estimate used is given 
by (7). 

1. Gains obtained by introducing enlarged primary units. The gains obtained 
by using enlarged primary units are calculated by comparing the mean square 
errors arising from the sampling design in which individual counties are primary 
units with the mean square errors arising from the design in which combinations 
of counties are the primary units. In both designs, the primary unite are drawn 
with equal probabilities and no area substratification is used. For this compari¬ 
son, preliminary computations have been completed for only a limited number 
of strata and for two of the labor force items given above; namely, total male 
workers and total female workers. The reduction in the sampling errors ob¬ 
tained by introducing enlarged primary units is estimated to be 48 per cent for 
total male workers and 26 per cent for total female workers. 

2. Further gains obtained by introducing probability proportionate to measures 
of size. The further gains obtained by using the principle of sampling with 
probability proportionate to measures of size are calculated by comparing the 
mean square errors arising from the design in which the units are drawn with 

• The contribution of the variance vnlhin the primary units to the total mean square error 
was relatively small in all instances, and practically unaffected by the introduction of the 
various principles. 
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equal probability with the mean square errors arising from the design in which 
the units are drawn with probability proportionate to measures of size. In 
both the designs, the primary units are combinations of counties, and in neither 
of them is area substratification used. The estimated per cent gains are as 
follows: 


Total Workers Agricultural Workers Nonagricultural Workers 
Male Female Male Female Male Female 

50 8 77 6 19 21 

The gains reflect both decreases in the sampling variance and the elimination 
of the bias which arises when the primary units are drawn with equal prob¬ 
abilities.’ 

3. Further gains obtained by introducing area substratification. The further 
gains obtained by using the principle of area substratification are calculated by 
comparing the mean square errors for the design in which area substratification 
is not used, with those for the design in which area substratification is introduced. 
In both these designs the primary units are combinations of counties, and are 
drawn with piobability of selection proportionate to measures of their sizes. 
The estimated per cent gains are as follows: 

Total Workers Agricultural Workers Nonagri cultural Workers 

Male Female Male Female Male Female 

6 31 46 51 32 22 

4. Gains obtained by the integration of the above principles into a single sub¬ 
sampling system (the specified subsampling system). The gams obtained by 
using all three principles are calculated by comparing the mean square errors for 
the specified subsampling system (in which all three principles are used) with 
the mean square errors for the system in which none of these principles is used. 
In the specified subsampling system, combined counties are the primary units, 
the primary units are drawn with probability proportionate to measures of their 
size, and area substratification is used. In the other system, the primary units 
are individual counties, the sampling is done with equal probabilities and area 
substratification is not used. Preliminary computations for this comparison 
are available for only 2 of the 6 labor force items; namely, total male and total 
female workers. The estimated gains were 76 per cent for male workers and 53 
per cent for female workers. 

7 As indicated before, estimate (7) is used in both designs compared above. This esti¬ 
mate iB unbiased for the design in which the primary units are drawn with probability pro¬ 
portionate to measures of size, but is biased for the design m whioh they are drawn with 
equal probabilities. However, for the latter design, the biased estimate is usually much 
more efficient than the best linear unbiased estimate. For the bix labor foroe items, the 
best linear unbiased estimate giveB rise to variances that are several times as large as the 
mean square errors for the biased estimate. 
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Calculations are available for all 6 items to measure the gains obtained by 
the use of the last two of the principles in combination; namely, probability 
proportionate to measures of size and area substratification. For measuring 
these gains, the systems are as described above, except that m both designs the 
primary units are combinations of counties. The estimated per cent gains are 
as follows: 


Total Workers Agricultural Workers Nonagncultural Workers 

Male Female Male Female Male Female 

54 37 88 54 45 39 

While both the specified subsampling system and the alternative to which it was 
just compared are biased designs, the bias in the specified system is appreciably 
smaller than the bias in the latter. For example, while the bias of the specified 
system in the estimation of total male workers was less than one-half per cent 
of the true total male workers, the bias for the alternative design in the estima¬ 
tion of the same population character was more than one and one-half per cent. 

6. The choice of estimate to use with the specified subsampling system. The 

simple estimate (7) given for the specified subsampling system may be improved 
on by the use of regression techniques (see Sec. Ill), However, such techniques 
may require a great deal of clerical work, so that they frequently cannot be used 
in practice. As indicated in the last part of Sec. V, however, if certain inde¬ 
pendent information such as a knowledge of the total population is available, a 
simple ratio estimate of the form of (12) may sometimes introduce gams over 
(7). The use of the ratio estimate may be particularly desirable when the 
correlation between the measures of size and the actual sizes of the primary 
sampling units is only moderately high, and when, at the same time, the actual 
sizes are highly correlated with the values for the character being estimated. 
A small-scale experiment in the sampling for labor force items indicated that for 
estimating total male workers for 1930, both the variance between primary units 
and the variance within primary units for the ratio estimate (12) were approxi¬ 
mately one-half that for the simple estimate (7). The use of the ratio estimate 
had very little effect in the estimation of the remaining five labor force character¬ 
istics. The reduction in variance of the total male employment figure was 
brought about because migration since 1930 reduced the correlation between 
the 1930 and 1940 sizes, and furthermore, the number of male workers is highly 
correlated with the total population, Similar reductions for the variances of 
the other five items were not obtained because the correlations with actual sizes 
for the other items were not as high. 

6. Some final remarks. The gains just obtained arose from application of 
the sampling principles enumerated above. The situations that these principles 
were applied to are favorable, but are frequently met in practice. The principles 
differ in their effect depending on the particular attributes of the population 
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being studied. The use of enlarged primary units may be desirable whenever 
the enlarged units are internally more heterogeneous than are the smaller units. 
The selection of primary units with probability proportionate to size is desirable 
for the general classes of populations described in Sec. VI whenever the primary 
units vaiy considerably in size The use of area substratification is limited to 
sampling situations where large primary units are used. The joint effect of all 
three principles shows to greatest advantage when subsampling is used, the 
primary units are large, but variable m size, and the number of primary units 
included in the sample is limited by cost or administrative conditions. The 
types of estimates described in Sec. Ill may be effective in a large number of 
physical situations other than those mentioned in this paper. 
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APPENDIX 

1. The effect of the consolidation of the primary units on the sampling vari¬ 
ance (see Sec. IV-1). Let X[ = £ £ Xjk/qn, be the average for the sample 

i k 

where the primary units are the original units and where Xjk is the value of the 
fc-th element in the j-th primary unit; q is the number of primary units in the 
sample, and n is the number of elements sampled from each of the q primary 
units. The variance of Xi is 


( 14 ) 


i 

’ X 


N — n _2 , 

(i N - l)ng ai “ + 


Q ~ Q 

(Q - 1)8 


2 

crib 


where Q is the number of original primary units in the population; N is the 
number of elements in each original primary unit; <rL = 22 (X,k' — Xjf/QN 
is the variance within the original primary units, with X j = ^2 Xjk/N; and 

k 

<rii = 2(Xj — X'f/Q is the variance between the original primary units, with 

X = s£,/q. 


( 16 ) 


( 16 ) 


c = 22 {Xjk - Xf/QN = + a \ b . Then 

<r\b = o- a [l + pi(N - 1 )/N 
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where pi — j j ^ 1S the intra-class correlation 8 between elements in 

the original units. 

From (15) and (16) 


(17) <riu = t- 1 cr 2 (l - Pl ). 

Hence 


(18) 


a = 


N — n a 


N nq 


Ml - pi) + 


Q - g 

(Q - 1)5 N 


r T [1 + Pi(N ~ 1)]. 


Similarly, the variance of X[ is 


(19) 



CN - n <t 2 
CN nq 




where X'i is the mean for the enlarged primary units, p 2 is the intra-class corre¬ 
lation between elements in the enlarged primary units and C is the number of 
original units combined to form each enlarged unit. Then 


( 20 ) 




_ o' f (q - 1)(C - 1) 

qN l«3 - 1)(Q - O 


■+ 



,vhere«,= md M = 

Q — 1 n (Q - C)G Cn 

Since 


ffli 


l. (C - DCs - l)(QiV - 1 ) 

(q - m - o 


go 


and 


(5 - 1 )(C - 1) ^ n 
(Q - l)(Q - C) = u ’ 


then a gam is brought about by enlarging primary units whenever pi > p 2 , 
where p x and p 2 are both positive 


2. Comparison of variances of certain alternative subsampling systems where 
the primary units are of unequal sizes. The development of (4), the formula 
for the difference between the variances of sample estimates compared in Sec, 
IY-2 is given below. We shall confine om selves to the simple case where only 
one primary sampling umt is drawn into the sample from each stratum Let 

(21) X' = ZAbXl/Y 

be the sample estimate used for each of the three designs to be compared, where 
X'i — X' h] = X] LjtM,, and X Kl k is the value of the A-th element m the j-th 

k 

8 For definitions and properties of intra-class correlations, see Sees. 38-40 of Statistical 
Methods for Research Workers, R. A Fisher, and [5]. 
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primary unit m the 5-th stratum; L is the number of strata; tihj is the number 
of elements drawn into the sample from the j-th primary unit in the 5-th stratum 

with Nh, the corresponding total number, N k = z2 Nk> with Qh = the number of 

7 

h 

primary units in the 5-th stratum, and N = N h . If the subsampling within a 

h 

stratum is of a constant proportion, C, as in the first of the subsampling systems 
mentioned, n^j in the above estimate is equal to C Nhi . If the subsampling 
within a stratum is of a constant number, as in the second subsampling system 
mentioned, as well as in the recommended system, n H , is equal to nh = C T', 

Nh,/Qh = Cft h . 

We shall denote the sample estimate for the first design by X [, that for the 
second design by X'i , and that for the recommended design by Jfs. 

The expected values of the sample estimates for the first two designs, X [, 
and X ' t , are the same, and arc equal to 


Eft.Eft.s-l 

N h Qh i Nh, k nh, N h Qh , 

where Xh , = ^ Jf/oi/Ahj. Thus, since X is not, m general, equal to ^Xhij/^Nh, 

_h _ _ h,t,i h.l 

= X, both X[ and Xi are biased estimates of X. 

For the recommended design, in which the primary unit is drawn with prob¬ 
ability of selection proportionate to size and a constant number taken from the 
sampled units within a stratum, the expected value of the sample estimate is 

h 7 Jx h hj A 7 

and therefore the estimate for the recommended design is unbiased. 

The mean square error of Zi is 


(23) 


2 






+ (x- X) 2 - i £ Nl(Xh - XhY 

where alj = (Xhjt — Xh,Y/Nhj is the variance between elements within the 
h _ 

j-th primary sampling unit of the 5-th stratum, X h = 22 Xhj/Qh,Xh) = Xhj/Nm, 
and X h = J2 Yj Xhjk/12 Nhi = £ NhiXhj/lZXhj. The first term in the square 

i h I i l 

bracket of (23) is the contribution of the variance within primary units. The sec¬ 
ond term in the square bracket is an approximation to the mean square error be¬ 
tween primary units and the remaining terms give the error in this approximation. 
The mean square error of X, is given by the same formula but with rih, replaced 
by n h . 



SAMPLING FROM FINITE POPULATIONS 


357 


The difference between and «rV is 

1 ■* 2 


(24) 



J_ V V 2 ^ J_ J_\ 

<W 2 h Qht ahl N hl - 1 \N h , nJ ’ 


which will be positive if aljNh, is negatively correlated with Nh,, as is almost 
invariably the case in practice (see Sec, VI). Thus, since cr 2 ^' ordinarily is 
larger than a j/ , it will suffice to compare o- 2 ^ with or 2 ^ to show that the recom¬ 
mended subsampling system is more efficient than either of the first two men¬ 
tioned. 


The variance for the recommended design is 


(25) 


2 




Nhj Nhl ~ fib oil L. V Nhl 
Nh Nh, - 1 »iY Nl 



For comparing the mean square error of Nl with the variance of Xl we shall 
define 


*-£[ifc.-v- s 4 ri ] 

as the intra-class correlation coefficient between elements within the j-th primary 
unit, where ol is the variance between all elements within the h-th stratum. In 
this comparison, the terms outside the square brackets in (23), have been ig¬ 
nored because their contribution to the mean square error is either positive or 
negligible. Then, 


(26) 



1 Nhj <tI, /, Nhi\ 

N*v QhXrNh, - I n,\ Nh) 



The second term of this difference was given in Sec, IV-2 as the approximate 
difference, and the first term was neglected. To examine the relative magnitudes 
of the two terms we shall write 


(27) — * ■— alj = ol{l - 5 h j). 

— 1 

Then 


( 28 ) 'Wii-iEf; 


2 

2 




For the general class of populations given in Sec. VI the covariance between 
8h, and Nh, , and also that between ph, and Nh,, will be negative. Moreover, 
in many practical problems of this class the two covariances will be of approxi¬ 
mately the same magnitude. In such instances the first term of (27) will be 

equal to — times the second, and thus smaller than the second term for all m > 1, 

iih 

and much smaller for moderately large values of . For example, in popula- 
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tions made up of clusters of different sizes for which the conditional probability 
of an element having a particular property for a fixed size of cluster is the same 
for all sizes of clusters, the two covin ianees will he very nearly equal. A number 
of practical problems approximate tins situation. Moreover, even m the situa¬ 
tions where the covariance of $k, and N),, is several times that of p h] and N h) , 
say 5 times as large, then the second term will be larger than the first for all 
n h > 5. 

Some numerical illustrations of the gains obtained through the use of the 
recommended system are given in Sec, VII, and for some of the items for which 
results are summarized in that section the gains were substantial. 


3. The derivation of the variance formulas (13) and (9). The mean square 
error of a ratio of random variables is generally approximated from Taylor’s 
expansion If X' and Y' arc random variables, Y' > 0, and r is the population 
character of which X'/Y' = r' is an estimate, then 


(29) 



E 


y/ 2 

(BY 7 )'- 





The first term in the right-hand side of (29) is a first approximation to the mean 
square error from Taylor’s expansion, and the second term is the error in this 
approximation, 

Eq. (13), and as a special case (9), is derived as follows: 


(SO) 


E(r' - r) 2 = E < 


A l 1 a h m h\j 

£ r £ £ £ Xw 

h vfi t j K 


L -l Sh 1 mh\ 1 
\ K tfi i 1 h 


- r>. 


I. ■, fl* 1 m/,,y 

Let I phiik = Y hilk (r hi} K ~ r), and Y' = £ f -£ £ £ ■ Then, setting 

Lh * i L 


(31) 


- iitjivv nijm ' / J —~-- ^ --/ ’ 


Ee 2 = EY'\r' - rf/{EY'y 

is the first approximation to the moan square error. 

Since EY' is evaluated in the same way as EX' (8), it is merely necessary to 
evaluate EY ,2 (r' — r) 2 , the numerator of Ed 2 . Now 

r . 1 ^*7> ^ j ”1^ 

EY'\r' - r) 2 = E £ i £ £ £ M 
L A *h * i k 

h h,g *h vg 

hf&q 

, Sk l mil Sh , 

where ^ = £ £ £ 'Awnt = £ . 

ilk i 
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Since ® - -E E E $’1*/ tl + E 2 ^ 


h i 


h t,r 

Xj 6 T 


(32) JJK'V - r)' - £ £ 4 E *1: + B £ ? £ *Ul + S E £ £ 

x ' h h t h. k i,t ,. y 

i3^r 


A,<? 


The first term in the right-hand side of (32) is 

^ tf * * 4 ~ yfe 4 -Pa Mm, Mm, - 1 V ^ 


(33) 


l V’ 1 ^ Pa, ffl'Ai, Wkij 1 -rr - ' . ^2 

I Zj j2 73. 7|,r J1 r. _ i (Z-I Ykiik) 
— -I k 


tl, tl Ph Mm, Mm, 

The second term of (32) is 

w e Vt$^ - Sites*-)’-Sit?t'* 

where 


•Vr 


^A., = 2 hi)k ; 

k 


and the third term of (32) is 


< 35 > E £^ 

Wq 

Therefore EY' 2 (P - rf = (33) + (34) + (35), and when Y hlJk (r lli]k - r ) is 
substituted for f ht] , we have 

CT'V - r)* - Z-S’L^Tr, M m ~J t £ - 0 1 

4 L ( >7 ■* A Mbit -M-hxj 1 

+ E §■' 5^ W 1 ^ E ^(nt,* - r)] 2 

i.i r'h Mm, Mhi, — lk 

+ St“E?Sr h (r„-r) f 

J JTh » Mfcij 


tr p A m; 


At? 




(36) 
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By substituting (r h , ]k — m, + r hij — rf for (r ht ,k — rf in the first term of 
(36) and PuMui/Phamuj for 1/fo in the 1st, 2nd, and 4th terms, the sum of these 
three terms becomes 


E Phi Mhil Phi ri V 2 _ \ 2 

"5" —— —r r h» Xhiikvhijk — T h %,) 

Ki.i.k Ph mu, Pi, 


(37) 


+ 2 £ 5- ~ |r F* YlAnv - n,',)(r A „ - r) 


+ 


s 5 = 0 F„„(n„ - [s yu - SA 

M»J x A ^ 2 -A» 7 ‘ L xVZa^J 

where P A <i = (Af A </ — mu,)/(Mu; — 1) and rut ~ 2 Xhfr/ 2 Yu,k . 

As fe 

When we substitute the appropriate value for l/t k in the 3rd, 5th, and 6th 
terms of (36), the sum of these terms becomes 


(38) 


2 5-’ IE jr 1 YUn., - r)T - 2 PE Yu,(r M , - r)T 

Ai 7 A [. i a hij J A L liJ •* A rji»j J 
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[s £ I- 


Now 

(39) 


2 ~ Yhi,(m, - r) = 2 -Pa* fe' - A = P»(B W( a, - rftbuj.r) 

I i M < \“A(/ / 

= PhRh,U):r(fhiU) — T ) 

where hiu) — Rhj<.A)/Rkj(A):r , and 


2-i=r Y Ki ,(rhn — r) = 2 Pm{Ruw — tRuw.y) — PhiRnw — rfJ A ( X ) : r) 

(40) <•> P*« , 

= PsR H A),r(hu) — r) 

where f* U ) = Rhw/Rhw-.r • 

Substituting (39) and (40) in (38), we have 

2 (Pa./PA^i&wnrCiW) - r) 2 - 2 P A PMA>-r(fAu> - r) 2 

A»j A 

(41) 

+ [2 PhRhw.Y{f h u, — r)] 2 . 

h 

By substituting (f** w - f, lU ) + hu) — r) 2 for fan*) — r) 2 in the first term 
in (41) and expanding, (41) becomes 


2 P»?r Rh,(A) r{fuu) — rhwf +22 w Rh,u)-r(fh,(A) — ^aooOW) — r) 

hit x h h,i x A 

+ 2p*(iv.i) — I") a [2 Rhiu) -.7 ~~ i2*U)-rJ + [2 PhRh(A) ■.r(fh(A) ~ r)} 1 . 
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Hence, since (EY') 2 EB 2 = (37) + (42), 


(43) 


,_, P AT _ ... X Y hil k{n,ik PAij) 
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+ [E PhRhw-.rihu) - r)] 2 

K 

Mki, _ __ Mu/ 

where a Mj r = E (A.,* - Yutf/Mu, and = E Y hllk /M h ,j = Y ht JM ht ,. 

A: A: 

The approximation to E(r' — r) 2 is given by (43) divided by (EY 1 ) 2 . By ig¬ 
noring the 2nd, 5th, and 7th terms which are negligible for a large class of popu¬ 
lations, we obtain (13). 

The variance of’ X' is derived from (43) by simply substituting Pu,/P for 
Yhxjk in (43). This follows from the considerations given below: 

Since r’ = X’/Y 1 , and X’ is the numerator of t\ a\> is given by oy when the 
denominator, Y' , is identically equal to unity in repeated samplings. 



Mu, Ph, _ A. 

Vl/ii, PWlhii Phi/ 


from (5), 


the denominator of r' which is equal to 

X E E E- Yhi,k, will be identically equal to unity in repeated sampling 

h * i Jc Whi] * hiJ 

when Yhijk is set equal to Ph*,/P where P = 2 Ph ■ 

The formula for the mean square error of X' (9), of course is exact since the 
error term 


E[Y ,2 /(EY , ?){r' - r} 2 = 0. 

It may be pointed out that er\> may be obtained directly and more simply 
without the use of (29) since X' is not estimated from the ratio of random 
variables. 

From (29), the error term for the approximation to E(r’ — r) 2 , (43) /(EY') 2 , is 


given by E 


( 


1 




if 


This cannot be expressed as a simple func- 
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tion of the individual observations, but useful maxima and minima for it may be 
obtained. A method for obtaining the upper and lower bounds of the variance 
of r' is simply attained from the following inequalities which hold independent 
of the joint distribution of X' and 7'. 

AT' 2 „ EY' 2 

(44) ~ (r' - r) < E(r' ~ r) 2 < (r 1 - r) 2 

X mux X xv, m 


where 7, n ,ix is the maximum value of the Y' obtained simply by choosing or 
estimating the largest Y h for each stratum. 7 min (the minimum value of 7') 
is obtained in a similar manner. 

Eq. 44 when evaluated turns out to be 


(45) 


(EYyw (gnigg? 

y2 < A(r — r) < 2 

X max X min 


where ( EY'fliti 2 is given by (43). 

Eq. (45) will serve adequately as an indicator of the accuracy of Ed 2 for sam¬ 
pling systems in which the vanabdity of the 7’s within strata is restricted. How¬ 
ever, in other designs, where stratification is not used and the variability in the 
7’s is not restricted the limits given by (45) may be too broad to be useful. 
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MULTIPLE SAMPLING WITH CONSTANT PROBABILITY 

By Walter Bartky 
The University of Chicago 

1. Introduction. In an attempt to reduce inspection costs, manufacturers 
have frequently resorted to sampling procedure in which the disposition of an 
aggregate or lot of similar units does not necessarily depend upon the results 
of a single sample. In practice, however, the number of permissible additional 
samples is limited to one or two; nevertheless, if the lot is veiy large, an appre¬ 
ciable reduction m the expected sample may be accomplished by allowing a 
greater number of additional samples. In this article probability formulae will 
be derived for an inspection piocedure for infinite lots in which the number of 
additional samples is not limited and may be any number depending upon the 
results of the sampling This development will be limited to the simple case of 
attribute inspection in which the units fall into two categories—satisfactory 
units or defective units. If p denotes the fraction defective in an infinite lot, 
then the probability of finding exactly vi defective units or defects in a sample 
of nis 

(1) P(m, n) = (") A’““. l-l-V- 

Since P{m , n) is the probability of m successes m n trials with constant probability 
of success p, though the terminology of commercial inspection will be used in 
this article, the results are applicable to other situations involving repeated trials 
with constant probability of success. 

In contrast with multiple sampling, a single sample inspection procedure for 
lots of the type here considered is one in which a lot of units is accepted or re¬ 
jected on the basis of the number of defective units found in the sample. Thus 
a lot is accepted if the number of defects is at most an integer c the “acceptance 
number,” and rejected if the number exceeds c. For an infinite lot containing a 
fraction p of defects and a sample of n units, the probability of accepting is by (1) 

(2) II. (c, n) = £ P(m, n), 

mSc 

and the probability for rejection is the difference between this sum and unity. 

2. Multiple sampling. The procedure in multiple sampling is to examine 
first an initial sample of n 0 units If the number of defects in this initial sample 
is at most c the lot is accepted and if the numbei of defects exceeds c + 7c ( k an 
integer) the lot is rejected. But if the number of defects is greater than c and 
less than c + k + 1 an additional sample is removed and examined In the 
latter case similar criteria determine whether the lot is to be accepted or rejected 
or this method of sampling continued. With an infinite lot this scheme of samp- 
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ling has an infinite variety of forms but there are certain advantages in limiting 
this discussion to the following type of multiple sampling procedure. 

I. Sample Sizes: The initial sample is of n a units but all additional samples 
are of the same size, namely n units. 

II. Condition for Acceptance: The lot is accepted if the number of defects in 
initial sample of n 0 units is at most c or if after taking r additional samples of n 
the total number of defects in the no -A rn units examined equals c + r. 

III. Condition for Rejection; The lot is rejected if the number of defects in 
initial sample of n 0 units exceeds c + k or if after taking r additional samples of 
n the total number of defects exceeds c + r + k. 

IV. Condition for an Additional Sample: An additional sample of n is taken 
only if neither condition II nor condition III is realized. 

Thus in this sampling scheme the level for acceptance as well as the level for 
rejection increases by unity for each additional sample of n. If at the r-th addi¬ 
tional sample a lot is neither accepted nor rejected then the total number of 
defects in initial plus 'additional samples must equal one of the k numbers 

c + r-f-l, c + r + 2, ••■jC + r+ft. 

Denote the probabilities for obtaining these numbers by 

(3) P.(r), P,(r), • • •, Pk(r) 

respectively, the subscript indicating the number of defects in excess of the ac¬ 
ceptance level 

To be accepted on the (r -f 1 )-st additional sample, (a) no defect must be 
found in the (r + l)-si additional sample and (b) a total of c -j- r + 1 defects 
must be found in previous samples. The probability of (a) is given by (1), 
taking m equal to zero, and the probability of (b) is the first one in the set (3). 
Consequently the probability of accepting a lot on the (r + l)-s< additional 
sample is 

Po(r + 1) = 2 n Pi(r). 

If II denotes the probability of eventually accepting the lot 

(4) II = L Pirn, no) + g"[Pi(0) + P z ( 1) + Pi(2) +■••], 

m ^ c 

where the first term on the right is the probability of accepting on the initial 

sample and may be evaluated by means of (1). Furthermore 

(5) P,(0) = P(c + i, n 0 ) 

and is by (1) the probability of finding c -j- i defects in initial sample 
According to the notation (3) the probability of finding a total of c + r + 1 

+ i defects in initial plus r + 1 additional samples, that is i more defects than 

the acceptance level, is P<(r + 1). These probabilities may be expressed as 
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linear combinations of the probabilities (3) with coefficients that are probabili¬ 
ties of the type (1). Thus 

(6) P,(r + 1) = P(i ~ 3 + 1, n)P,(r) 

3 

where the sum may be made to extend for j = 1, 2, • - , k, provided one defines 
(1) as equal to zero for negative m By repeated application of this linear trans¬ 
formation it is possible to express the probabilities (3) for additional samples in 
teims of the probabilities (5) for the initial sample. Thus if M denotes the k X k 
square matrix with elements 

(7) M t] = P(i - j + 1, n) (i,j = 1, • • • , k), 

by omitting subsciipts and regarding P(r) as a vector with elements given by 
(3), the linear transformation may be written 

(8) P(r + 1) = MP(r). 

Hence by repeated application of (8) 

(9) P(r) = JTP(0) (r - 0, 1, 2, ... ) 

provided the zero power of the matrix M is defined as the identity matrix I. 

The probability P,(r) cannot exceed the probability of finding exactly c + 
r + i defects in a single sample of n 0 -f- m units, that is, in the notation of (1), 
the probability P(c -f r + i, b 0 + m) Since the latter probabilities approach 
zero as r approaches infinity it follows that the limit of the elements of P (r) as r 
approaches infinity is zero. Thus with this multiple sampling procedure a lot 
is eventually either accepted or rejected. Furthermore since the matrix M con¬ 
tains no negative elements and P(0) may be chosen with all positive elements 
it follows that the elements of M r approach zero as r approaches infinity or 

(10) lim. M r = “0", the zero matrix. 

r~*as 

It can be demonstrated that since the limit (10) is the zero matrix the sum of 
the infinite geometrical series m the matrix M 

(11) I + M + M*+ -• = (/- M)~\ 

where the right member is the reciprocal of the matiix I — M. Consequently 
the infinite sum of vectors 

( 12 ) V = £ P(r) = (I ~ itf) -1 P(0). 

r-0 

This infinite sum of vectors has elements V\ , V-i, ■ ■ • , Ft of which the first 
element is the sum in brackets occurring in the right member of (4). Hence the 
probability of eventually accepting the lot 

(13) II = T, P(m,n 0 ) + fV u 
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and is thus by (12) and (5) expressible in terms of probabilities for the initial 
sample, equations (1), and the reciprocal of the matrix I — M. 

In addition to the probability for acceptance one is also interested in the 
expected number, E , of additional samples. Since 

1) (r= 1,2,3, 

where the sum extends over all i = 1 , 2, • • • , k is the probability of continuing 
to the r-th sample, it follows that 

r Pi{* - 1) - £ P.(r) 

i t 

is the probability that lot will be either accepted or rejected on the r-th sample. 
Therefore the expected number of additional samples 

E - 22 rE P.(r ~ 1) - 22 W] 

r >0 i \ 

= i\(r), 

r£0 i 

or, on interchanging the order of summation and applying (12), 

(14) E = £ V,. 

% 

That is, the expected number of additional samples equals the sum of the ele¬ 
ments of the vector V, 

Though it is possible to develop a general expression for the reciprocal matrix 
I — M, to determine the acceptance probability, n, as well as the expected num¬ 
ber of additional samples it is only necessary to evaluate V. Now by (12) this 
vector is the solution of the linear system of equations 

(16) (l - M)V = P(0), 

Though for k small this system could be solved directly, m order to find a form 
of the solution applicable for any value of k, let the expansion in power series in 
x of 

(16) [(pz + q) n - x] -1 = gi + g 2 x + + • • • , 

where the coefficients, g , are functions of p and q. On clearing of fractions and 
equating coefficients of like powers of x it is found that 

(17) 01 ■*= <f n 

and, by equating the coefficients of the first k powers ,of x and using the nota¬ 
tion (7), 

( 18 ) 0.-22 Mijg, = 

j-i ..fe 


0 

jSh+1 


(i — 1, 2, ■ • •, k ~ 1), 
(i ~ k). 
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Similarly, if the expansion in power series of 


(19) 


Z Pi( 0)s* 

(px + q) n — x 


— hi -{- hiX + hzX* ■+■ 


where the sum is for all i = 1, • ■ ■ , k, then by clearing of fractions and equating 
coefficients of like powers of x it is found that 

(20) hi = 0, 

and 


(21) K - Z MiA={ 

i-i" i (—Pt(0) + hit+i 

It follows from equations (18) and (21) that if 

(22) V i — Qxhie+i/gk+i ~~ h, {i 

then V, the vector with these elements, will satisfy equation (15). 
and (20) 


(23) 


Vl = q "hh+i/gt+i, 


— - 1 ), 
(* = *). 

= I? ''' i k)i 
Since by (17) 


the probability for eventually accepting the lot is by (13) expressible as 
(24) II = Z p ( m > no) + h+i/gk+i, 

flvgC 

while the expected number of additional samples is the sum of elements (22) 
of Vi. 

These results will now be summarized and simplified formulae derived for 
special cases. In the summary all probabilities are expressed by means of (5) 
in terms of the probabilities (1). 


3. Summary of multiple sampling formulas. For this multiple sampling 
procedure the initial sample is no and the additional samples are n. A lot is 
accepted if on the r-th additional sample the total number of defects found is at 
most c + r and rejected if the total exceeds c + r. An infinite lot containing a 
fraction p of defects is either accepted or rejected, the probability of acceptance 
being given by 

(25) n = Z (9=1- V), 

m$c \"V 

and the probability of rejection is 1 — n, The expected number of additional 
samples is 


E = ^±- 1 Z m - Z h , 

9*+l i i 


( 26 ) 
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where the sum extends over i ~ 1, 2, ■ ■ ■ , k. The g, and hi are the coefficients 
in power series of x in the expansions of: 


(27) 

(28) 


{ f x + 1 q y-x~ gi + 9tZ + 9 * x2+ ’ 

( n l p c+ 'q na ~ c ~'X' 


where the sum is for all i = 1, 2, ■ ■ • , 1c. These formulae apply to all fini te 
values of c and k provided the binomial coefficient is zero for values of the argu¬ 
ment falling outside those occurring m the ordinary expansion of an integral 
power of a binomial. 


4. Computation of coefficients g and )u If the denominator in (27) is first 
expanded in power series in 


x(px + q'f" 

and then the resulting negative powers of binomials expanded in power seriesin 
x , it is found that 




gt = 5 


- 4 * / n\ _*_i 

vq 


(29) 


gk 


__ __ y' (—D m+1 ~ m ) n "f - m ~ 


X Yq~ kn+nn - m , 


By (28) the coefficients h are expressible in terms of the p’s, 

Ai = 0, 

hk 


(30) 


= £ L 7 

\c -f- 1 / 


gk-i 


k^l. 


k 9k I. 


Other expressions for the coefficients may be derived from the theory of func¬ 
tions of a complex variable. Thus by Cauchy's Integral Formula 


( 31 ) 


_ 1 f dx 

gk+1 ~ 2,^-1 Ja x* +1 [(px+g)” - x]’ 

h = * f 5(x) dx 

’ i+l — 1 Jc x t+1 [(px + q) n — x] ’ 
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where 

(32) 


m 


-jsx+y* 


,n 0 -c-n t 

& f 


and the closed path of integration C m the complex plane only includes the pole 
at the oiigm. Since the integrands are rational functions and the point at in¬ 
finity is not a singularity for either integrand, these integrals taken about the 
origin are equal to the negative sum of the conesponding integrals taken about 
the zeros of 

(px + q) n — x. 

If V 9* n~ l it can be demonstrated that there are n distinct zeros xi, zi, , z„ 
corresponding to the solutions of the algebraic equation 


(s = 1, • ■ ■ , n). 


(33) (px. + q) n = x, 

One solution is obviously 

(34) a* = 1, 

and for p = nT 1 this solution is a double root 
The integrals about these zeros are obtainable from Cauchy’s Integral Formula 
and after integrating and simplifying the resulting sum by means of (33) it is 
found that for the case p ^ n~\ 

1 . P x > + g 


(35) 


1 -np + JR. q - (n — l)px,] ’ 

h = j_ V (yx, + q)S(x,) 

L+1 1 — np ,_ 2 ,— {n — l)px,]‘ 


If the power senes (27) is multiplied by the series 

(1-x)-' = l + x + x 2 + x*+ ■■■, 
the resulting product 

+ fa + + fa + 9* + gitf + 

so that, by Cauchy’s Integral Formula, 


(36) 


Gk= .-fa g ' = 2W~lIc 


dx 


a x (1 - x)[(px + q) n - x ] 


Similarly the sum of the coefficients h that occui in the right member of (26) may 
be written 


( 37 ) 




S{x) dx 


c a: (1 - x)[{px + q) n - x ]' 
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The integrals (36) and (37) .are of the same type as (31), and by employing the 
same method of integrating used in deriving (35), the following expressions for 
the sums of coefficients g and h occurring in (26) are obtained: 


<?* = E&- 

i 



n(n - l)p a , y _ 

2(1 - np)*^ 


(38) H h = E h = 
* 


foS(l) - £'(1) 
1 — np 


n(n — l)p 2 <S(l) 

2(1 - npY 


yx, + g _ 

£.)[? — (n - l)pa 


+ 


E 


(px, + q)S(x.) 
xt(l - X,)[q - (n - l)px,] 


provided np ^ 1. Here S' (1) is the derivative of (32) with respect to x evaluated 
for x — 1. For the special case np = 1, two of the roots of (33) 


X\ = Xi — 1, 

and the integrals (36), (37) and (38) become respectively 

^ n + x. ~ 1 

(n - l)0 fc+ i = 2 kn + %n - i + Zj Xt ) > 

Tl ~f~ X _ 1 

(n - l)/t* + i = (2 kn + fn - i)S(l) - 2nS'(l) + E s ( x >)> 

(39) (n - 1) E 0 . = + Ifcn + -Arn “ + E x J(i J x ,f ’ 

(n - )1 E ^ = (fc’n -f + 'Arn — 4& — tV — In. X )<S(1) 

t 

- (i» - 4 + 2kn)S'{l) + nS"(l) + E S(x.) } 

where the sum extends over all roots of (33) that are not equal to unity. Here 
S'(l) and S"( 1) are the first and second derivatives of (32) with respect to x 
for x = 1 and p = n _1 . 

Formulas (35), (38) and (39) require for their evaluation the solutions of equa¬ 
tion (33). For n greater than unity there are just two positive real solutions, 
say Xi = 1 and . If n is even all other roots are complex numbers, while if n 
is odd they are complex with the exception of one negative real root. Conse¬ 
quently by (33) for a = 3, 4, ■ • • , n the absolute values of the roots satisfy the 
inequality 

(p ! x, | + ?)" > x ,, 

and consequently the | x, | cannot be between it and x %. But equation (33) may 
be written 


(px, + g)" - 1 = 1 
(p*. + g) - 1 p 


(s ^ 1) 
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so that for s = 2,3, • ■ • ,n 

(40) 2 (pa. + qY = 1/p 

t 

where the sum is taken for & = 0,1, ■ • • , n — 1 and therefore 

Z (p | x-1 + qY >l/p (s = 3, 4, • • • , n). 

t 

Now Xi is the only real and positive solution of (40), consequently, in order to 
satisfy the inequality, the absolute values of roots corresponding to s = 3, 
4, ■ • ■ , n must exceed x 2 . On combining this result with the former, it follows 
that 

(41) | a* | >1 and Xi. 

Consequently for large values of k the most important terms in the light members 
of (35), (38) and (39) correspond to the real positive roots Xi = 1 and a: 2 of equa¬ 
tion (33). By omitting the terms corresponding to s = 3, • ■ • , % one can derive 
approximations to the g and h and their sums applicable for large k values. In 
fact for np near unity the roots corresponding to s = 3, 4, • • ■ are considerably 
greater than unity as is illustrated in the following table of roots for the case 
np = 1: 

n = 2, p ~ 1/2; x, = 1, 1; 

n = 3, p = 1/3; x, - 1,1, —8; 

n = 4, p = 1/4; x, = 1,1, — 7 ± 4 y/— 2) 

n = 5, p = 1/5; x, — 1, 1, —12.2531 • • • , 

-4.8734 • • • ± 7.7343 • • ■ y/~l 

and for s = 3, 4, 5, ■ • • , | x. | is greater than 8. 

For very large values of n and small values of p one can find approximate 
values for the roots by solving the limit equation obtained from (33) by putting 


a = np 


and letting n approach infinity. This equation is 
(42) e a(x ‘~ l) = x., 

where e is the base of the natural logarithms. For the case a =JL, the roots are 
1, 1, 3.0891 ■ • • ± 7.4602 ■ V-l. 3.66 ••• ± 13.88 ■■ ■ V-l and 

_ 5(1 + logth) _ wzi) + by/- 1 approximately, 
b 2 -h 1 


where 


b = (2u+ l/2)ir, 


u = 4, 5, 6, ■ • • . 
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From equation (39) and these numerical results it follows that even with h as 
small as 3 the percentage error for the ease np = 1 introduced in g t by omitting 
the terms in the indicated sum is less than .002%. Consequently for all practi¬ 
cal purposes one may omit the complex and negative roots for values of k greater 
than 3 m computing the g’s for np in the neighborhood of unity For smaller 
values of k the exact values of the g’s are readily obtainable from (29). 

6. Special cases. Consider first the case m which c < 0 and no < k + c. 
With these conditions, under no circumstances could a lot be accepted or rejected 
on the initial sample and the indicated sum in the right member of (25) is zero. 
Furthermore for this case the sum (32) becomes 

(43) S(x ) = (px + q) n °x~°. 

Consequently it follows from (33) that 

(44) S(x.) = x‘r, 
where 

(45) t — no/n. 

It should be noted however, that for i not an integer the right member of (44) is 
multiple valued and one must take that value for which 

(46) ' x‘, = (px, + 2 )"\ 

Thus for real positive values of x, , the right member of (44) is real. For integral 
values of t there is of course no ambiguity in the notation. 

If (44) is substituted in the second equation of (35), the resulting expression 
for the h coefficient is of the same form as that for the g coefficient, in fact 

hk+l = ffk-t+c+l , 

so that by (25) the probability for acceptance is for this case 

(47) II = gk—t- i-c+i/ Qk+i • 

In similar manner it follows from (43) and (46) that the sum of the h coeffi¬ 
cients, equation (38), 

Hk = Gk-t+c + t 

and hence by (26) the expected number of additional samples 

(48) E = HGk — Gk-i+c — t. 

Since the initial sample is nt units and the additional samples are all equal to 
n units, the expected total number of units, sampled, that is, initial plus addi¬ 
tional samples is 

(49) 


/ = «o + nE = n(UGk — Gk-t+c )■ 
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Since for this case it is impossible to accept or reject on the initial sample one 
could combine the initial sample with the first additional sample. In fact one 
can continue combining initial and additional samples and thus increasing c and t 
provided the new initial sample n 0 and the new c value thus obtained are such 
that 

(50) c 5 0, n a = nt ^ k n — 1 + c 

In this process of combining samples t and c increase at the same rate and conse¬ 
quently formula (47), and the right member of (49) are unchanged. In other 
words formulas (47) and (49) may also be used under conditions (50) 

It was demonstrated in Section 3 that for k sufficiently large one can omit 
those terms in (35) and (38) corresponding to complex or negative roots of (33). 
If this is done the following useful approximations for the g and G are obtained: 

Qk = (1 - np)~ l + [q - (» - l)^]" 1 , 

(51) Gk = k(l — np)~ l — §n(n — l)p 2 (l — np) -2 

+ [q- (» - l)px]~\l - x)-'x- k+llln) , 
provided np 1, k 9 ^ 1 and x is the real positive root of 

(52) ( px + q) n - x (np ^ 1) 

that is not equal to unity. For np — 1 these approximations become by (39) 

(n — l)< 7 fc = 2 kn + 2n/3 — 4/3 

(n - l)G k = k 2 n + 5kn/3 + n/18 - 4fc/3 - 1/18 - n _1 /9, k 9 * 1. 

These formulae in conjunction with formulae (47) and (49) give quite satis¬ 
factory approximations for the probability for acceptance Et and the expected 
total number of units sampled even when values of the subscripts employed are 
as small as 3. Of course the larger the value of k m (51), (52) or (53) the better 
these approximations. 

Now the root x of (52) is greatei or less than unity depending on whether the 
product a = np is less than or greater than unity. Consequently it follows from 
(47) and (51) that for c = 0 and t finite 

II' = lim II = lim gk-t+i/gk+i 

ft-* 00 ft-*oo 

(54) =1, np < 1; 

= x\ np > 1; 

while by (49) and (51) the expected total number of units sampled has the 
limiting value 

I' = lim 1 = 

ft-* co 


( 55 ) 


n£(l - np) 1 , np < 1; 
00 , np > 1. 
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But k infinite implies that under no circumstance can a lot be rejected, Conse¬ 
quently n' and I' are the exact values of the probability for acceptance and the 
expected total sample respectively for the following sampling procedure: 

The initial sample is n 0 = nt and all additional samples are n The lot is 
accepted if on the initial sample no defects are found or if after taking r addi¬ 
tional samples a total of exactly r defects is found. 

In inspection problems p is usually small and n large so that the approxima¬ 
tion (40) may be used to determine the real positive root x, thus 

(56) = x (a = np). 

It then follows from (54) and (55) that for np > 1 


(57) 


— log n' 

l — X 


n 0 p, 


— log X 
1 — x 


np. 


These relations are of course equivalent to (54) and (56). Suppose that the 
probability II' and the fraction p are assigned. Then the initial sample n 0 , 
and additional sample n, will depend on only the parameter x Consider next 
the problem of sampling a number of lots that fall into two categories, namely 
those containing a fraction p of defects and those containing a fraction p* of 
defects where p * < p. If in addition the sampling procedure is to be such that 
lots with fraction p* of defects are eventually accepted, but lots with fraction p 
of defects have a small assigned probability of acceptance II', then whatever 
the value of x as long as the resulting #y i 1 these conditions are satisfied. 
Furthermore if one insists that the expected total sample for lots containing a 
fraction p*, namely by (55) 

I'(p*) = n a (l — np*) -1 , 


be a minimum, then it is found that 


(58) x — p* /p. 

This remarkably simple result is capable of still greater generalization. By an 
altogether different approach to the problem the author has succeeded in proving 
that of all possible multiple sampling procedures, the multiple sampling method 
here described and defined by equations (57) and (58) gives the minimum 
expected inspection for the problem under consideration provided n is sufficiently 
large. 1 

By letting both — c and k approach infinity it is possible to derive probability 
formulae for sampling procedure in which a lot is either rejected or the sampling 
continues without end. These formulae are included in Table I along with 
other special cases derived from previously listed general formulae. 


1 I7o(e,"The author has postponed publication of this proof in the hope that it might 
be generalized to include sampling problems involving both acceptance and rejection of 
a lot, 
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TABLE I 


Notation: 

n = number of units in each additional sample 
n 0 = number of units in initial sample 
p = fraction defective in lot 
a = np 
3 = 1 ~ V 

c = maximum number of defects in initial sample for acceptance 
t = no/n = ratio initial sample to additional samples 
/ = c + & + 1= minimum number of defects in initial sample for 
rejection 

c + r = number of defects in initial plus first r additional samples for 
acceptance 

/+r = c + )i+l + r= minimum number of defects in initial plus 
first r additional samples for rejection 
II = probability of eventually accepting lot with fraction p defects 
1 — II = probability of eventually rejecting lot with fraction p defects 
I = expected total number of units sampled (i.e , initial plus what¬ 
ever additional samples are sampled). 
x — real positive root different from unity of the equation 
(pa; -f q) n = x. 


Conditions 

n 

I 

k = 1 
(a) c =0 
/ = 2 

"os/ 1- ( n ~ n °) V<l n ~ y 

v 1 - (2'"‘ - 

' 1 — npq n ~ l 

1 — npq n 1 

k = 1 
c - 0 

(b b -2 

no = n 

q n (l - npg 71-1 ) -1 

n( 1 — npq"~ x )~ l 

/ \ C —— ““ k 

<c) i «i 


no + ng" 0 n Gk/gh+1 

k = 1 
(d) c = -1 
/ = 1 

q na+n a - nprrT 1 

no + n 2 n °(l — npq n ~ l y l 

k = 2 
(e) e = -2 
/ = 1 


^0 + 

nq n °(l + q n — npq n ~ l ) 

1 "-1 l 7l ( n + 1) Jl In- 2 

x aivyq T-o- V U ? , i\ 

2 1-2 np3 n - 1 + n(n o +1) pV"- 2 






TABLE I —Concluded 


Conditions 

n 

I 

Jfc = —c 

(f) = » 

/ = 1 

0 for np > 1 

~ np) for np < 1* 

no -f- nx(l — x) _l for np > 1 

» for np < 1 

c = 0 
n = 2 

<*>« -r 

= k + 1 

I 

no(2II - I) 

1 + (p/3)"“ 

5 - V 

c = 0 
n = 2 
(h) no = / 

= fc + 1 
P = 1/2 

0.5 

n\ 

(i) ° " ° 
n a — n 

Hk/gk+i 

n(UGk ~ Gk-x) 

Ci 

11 11 

8 ° 

1 (np < 1) 

/ t ' n (np > 1) 

no(l — np) -1 (np < 1) 
co (np > 1) 


* In this sampling procedure ft lot cannot be accepted so that H is the probability 
that additional samples will be taken without end. The probability of rejecting lot 
is however 1 — n. 


TABLE II 

Values of g and G for Limit n ~ , p ~ 0 


np » 

a os 

0.2668 

0.4024 

0.6931 

m 


2.0118 

2.6584 

S - 

10 

6 

2 

m 

.6 

.2 

.1 

■ 

1.292 

1.495 

2.000 

2.718 

4.000 

7.477 

12.915 

I 

1.338 

1.634 

2.614 

4.671 

10.455 

40.86 

133.76 

mm 

1.3432 

1.665 

2.935 

6.667 

23.48 

208.2 

1343.2 


1.3437 

1.6717 

3.097 

8.667 

49.55 

1045. 

13.4 X 10 8 

9t 


1.6729 

3.178 

10.667 

101.70 

5228. 

134 X 10* 


1.3438 


3.2588 

CO 

OO 

OO 

GO 

Gx 

1.292 

1.495 

2.000 

2.718 

4.000 

7.477 

12.915 

$2 

2.629 

3.130 

4.614 

7.389 

14.45 

48.34 

146.7 

G 8 

3.972 

4.795 

7.549 


37.93 

256.5 

1490 

$4 

5.316 

6.467 

10.65 

22,72 

87.5 


14.9 X 10 a 

G* 

6.660 

8.140 

13.82 

33.39 

189.2 


149 X 10 3 
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As an illustration of the method of application of these formulae, suppose that 
the sampling procedure is to be such that the probability, II, of accepting a 
“p” value of 0.5 + e equals the probability of rejecting a “p” value of 0.5 - «. 
This condition on probabilities is by Table I, formula ( g ), always satisfied if 
c = 0, n = 2, and no = k + 1. This corresponds to a multiple sampling scheme 
in which additional samples are only two units each and a lot is accepted or 
rejected on initial sample if none or all units are defective With e = 0 1 and 
II ^ 1/6, one can take n Q = 4 and k = 3. The expected total number of units 
examined depends on “p” and varies for this numerical case from 4, for p = 0 
or 1, to a maximum of 16, for p = 0 5. Nevertheless a single sample plan 
satisfying the same conditions would require a sample of 23 units whatever the 
value of p. 

The previous problem is, however, not typical of those encountered in com¬ 
mercial inspection for in such situations p is usually very small. In practice 
one can generally replace the formulae in Table I by their limiting values for 
n = oc, p = 0, and np = a. Table II gives the limiting values of the g and G 
as well as x for a small number of values of a 

Finally the justification for multiple sampling lies in the fact that a reduction 
in the expected total sample is possible. Though this papeT is limited to the con¬ 
sideration of a very elementary type of sampling, it indicates that it might be 
worth while to investigate the possibility of utilizing the methods of multiple 
sampling in inspection for variables. Unfortunately serious mathematical 
difficulties are even encountered in so simple a problem as multiple sampling 
from a normal population for the mean. 



AN EXACT TEST FOR RANDOMNESS IN THE NON-PARAMETRIC CASE 
BASED ON SERIAL CORRELATION 1 


By A. Wald and J. Wolfowitz 
Columbia University 


1. Introduction. A sequence of variates xi , • • • , x K is said to be a random 
series, or to satisfy the condition of randomness, if x x , ■ ■ ■ , xn are independently 
distributed with the same distribution; i e., if the joint cumulative distribution 
function (e.d f.) of * 1 , - • ■ , is given by the product F(xf) ■ ■ ■ F(x N ) where 
F(x ) may be any c.d.f. 

The problem of testing randomness arises frequently in quality control of 
manufactured products. Suppose that x in some quality character of a product 
and that xi , x N are the values of x for N consecutive units of the 

product arranged in some order (usually in the order they were produced). The 
production process is said to be in a state of statistical control if the sequence 
(xj, ■ ■ ■ , %n) satisfies the condition of randomness. A number of tests of ran¬ 
domness have been devised for purposes of quality control, all having the fol¬ 
lowing features in common: 1) They are based on runs m the sequence xi, ■ ■ • , 
Xn . 2) The test procedure is invariant under topologic transformation of the 
x-axis, i e., the test procedure leads to the same result if the original variates 
Xi, • • • , Xn are replaced by x [, • ■ • , x# where x a — f(x a ) and /(f) is any con¬ 
tinuous and strictly monotonic function of t. 3) The size of the critical region, 
i.e., the probability of rejecting the hypothesis of randomness when it is true, 
does not depend on the common c.d.f. F(x ) of the variates x\ , ■ * , x# . Con¬ 
dition (3) is a fortiori fulfilled if condition (2) is satisfied and if F(x) is continuous. 
The fulfillment of condition (3) is very desirable, since in many practical appli¬ 
cations the form of the c.d.f. F(x) is unknown 

Tests of randomness are of importance also in the analysis of time series (par¬ 
ticularly of economic time series) where they are frequently based on the so- 
called serial correlation. The serial correlation coefficient with lag h is defined 
by the expression 2 (see, for instance, Anderson [1]) 


(1) 


Rh 


N 

\ Xa Xh\a 
cr«-l 


£ 


a “1 


2 




-(£*•)> 


where x* +a is to be replaced by xj 1+a _w for all values of a for which h + a > N. 
The distribution of Rh has recently been studied by R. L. Anderson [1], T. 
Koopmans [2], L. C. Young [3], J. v. Neumann [4, 5], B. I Hart and J. v. Neu- 


1 Presented to the Institute of Mathematical Statistics and the American Mathematical 
Society at a joint meeting at New Brunswick, New Jersey, on September 13,1943 

* Some authors (see, for instance, [2] p. 27, equation (61)) use a non-circular definition, 
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mann [6], and J. D. Williams [7], under the assumption that x x , ■ • • , x N are 
independently distributed with the same normal distribution. Thus, in addition 
to the randomness of the series (x x , • • • , x N ) it is assumed that the common 
c.d.f. of the variates x x , • • , xy is normal This is a restrictive assumption 
since frequently the form of the common c.d.f F(x) of the variates x x , ■ ■ ■ , x N 
is unknown. 

The purpose of this paper is to develop a test procedure based on Rk such that 

(a) if F(x) is continuous the size of the critical region does not depend on the 
common c.d.f. F(x) of the variates x x , • ■ ■ , a;y, thus making an exact test of 
significance possible also when nothing is known about F(x) except its continuity; 

(b) if F(x) is not continuous, but all its moments are finite and its variance is 
positive, the size of the critical region approaches, as N —* oo, the value it would 
have if F(x) were continuous. Thus in the limit an exact test is possible in this 
case as well. We will refer to the case where the form of F(x) is unknown as the 
non-parametric case, in contrast to the case when it is known that F(x) is a 
member of a finite parameter family of c.d.f.’s 

The test based on the serial correlation seems to be suitable if the alternative 
to randomness is the existence of a trend 8 or of some regular cyclical movement m 
the data, In the analysis of time series it is frequently assumed that this is the 
case and this is perhaps the reason why tests based on serial correlation are 
widely used in the analysis of time series In quality control of manufactured 
products the existence of a trend is often considered as the alternative to random¬ 
ness, caused perhaps by the steady deterioration of a machine in the production 
process Thus, tests of randomness based on serial correlation could also be 
used in quality control. 

2. An exact test procedure based on R h . Let a a be the observed value of 
x a (<x = 1, ■ ■ • , N). Consider the subpopulation where the set (x x , ■ • • , x N ) is 
restricted to permutations of a x , ■ • ■ , a N . In this subpopulation the proba¬ 
bility that (xi, ■ ■ , Xti) is any particular permutation (a x , • • ■ , ay) of (ai, ■ ■ ■ , 
ay) is equal to 1/iV! if the hypothesis to be tested, i.e., that of randomness, is 
true. (If two of the a, (z = 1, 2, • • , N) are identical we assume that some dis¬ 
tinguishing index is attached to each so that they can then be regarded as distinct 
and so that there still are N\ permutations of the elements a x , • • , a K ) 

The probability distribution of Rh in this subpopulation can be determined as 
follows: Consider the set of Nl values of Rh which are obtained by substituting 
for (aq , ■ • ■ , x N ) all possible permutations of (ai, • ■ • , ay). (A value which 
occurs more than once is counted as many times as it occurs.) Each of these 
values of Rh has the probabihty 1/A 7 !. On the basis of this distribution of Rh 
an exact test of significance can be carried out Suppose that a is the level of 
significance, i.e , the size of the critical region. We choose as critical region a 
subset of M values out of the set of N\ values of J2* where M/N 1 = a. The sub- 

J If the existence of a trend is feared it may be preferable to use the non-circular statistic 
discussed, for example, in [2], 
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set of M values which constitute the critical region will depend in each particular 
problem on the possible alternatives to randomness. For example, if a linear 
trend is the only possible alternative to randomness, then the critical region will 
consist of the M largest values' 1 of R h , The value of the lag h will also be chosen 
on the basis of the alternatives under consideration For instance, if some 
cyclical movement m the data is suspected the choice of h will depend on the 
form of these cycles. The general idea underlying the choice of the subset of M 
values and of the lag is to make the power of the test with respect to the alterna¬ 
tives which are particularly feaied as high as possible. 

If Rh has the same value for several permutations of (a lt ■ ■ • , q N ) t it may be 
impossible to have a critical region consisting of exactly M values of R h . For 
example, if ai — ai = ••=«», then all the. iV 1 values of Rh are equal, and the 

number of values of Rh included in the critical region must be either 0 or AH. If 
F(x) is continuous the probability that two values of Rh be equal is zero. This 
explains why an exact test is always possible when F(x) is continuous. On the 
other hand, if F(x) is not continuous, the probability that several values of R h 
be equal is positive However, the theoiem we shall prove in Section 4 shows 
that in the limit an exact test is possible even when F(x) is not continuous, but 
has finite moments and a positive variance. For if the latter is true, the 
probability is one that the weaker conditions for the validity of our theorem 
(given at the end of Section 4) will be fulfilled 
Consider the statistic 

N 

( 2 ) Rh ” ^ . Xa Xh+u 

a-1 

where Xh+ a is to be replaced by for all values of a for which /<, + «> N. 

Since in the subpopulation under consideration 2««i x a and are con¬ 

stants, the statistic Rh is a linear function of Rh in this subpopulation Hence, 
the test based on Rh is equivalent to the test based on Rh ■ Since Rh is simpler 
than Rh , in what follows we shall restrict ourselves to the statistic Rh ■ 

Wc shall now show that, if h is prime to N, the totality of the AH values 
taken by R h is the same as T a , the totality of the AH values taken 'by Ri. 

In the argument which, follows it is to be understood that, whenever a positive 
integer is greater than N, it is to be replaced by that positive integer less than or 
equal to N which differs from it by an integral multiple of N . 

Clearly it will be sufficient to show the existence of a permutation pi, p *, • • • , 
prr of the first N integers such that 


Vi + 1 = pi+h 

a = i, 2 , • 


Such a permutation is given by 



3 'P ( 

O' - 1 , 2 , • 

■■ ,N). 


For if j ^ j' then (j — l)h + 1 =a ( j‘ — 1 )h + 1 because h is prime to AH Hence 
to every positive integer i there is a unique positive integer j, (i, j < N ) such 


4 See footnote 3 
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that 

Now 


i a (j -l)h+l 


V* + 1 — Po-m+i + 1 = 3 + 1 = Pih+i = Pi+fc, 
which is the required result. 

In what follows we shall restrict ourselves to the case when h is prime to N. 
This is not a very restrictive assumption since in practice h will be small as com¬ 
pared with N and by omitting a few observations we can always make N prime 
to h. Since T h is the same as T x we shall deal with the statistic Ri only. To 
simplify the notation we shall write R instead of R x . Thus, the test procedure 
will be based on the statistic 

n-l 

(4) R ” ^ Xa Xa-f 1 i X// Ul . 

a=»l 

If N is very small an exact test of significance can be carried out by actually 
calculating the N\ possible values of R. However, this procedure is practically 
impossible if N is not small. In Section 3 the exact mean value and variance of 
R will be calculated, and in section 4 the normality of the limiting distribution 
of R will be proved. Thus, if N is sufficiently large so that the limiting distribu¬ 
tion of R can be used, a test of significance can easily be carried out. Difficulties 
in carrying out the test arise if N is neither sufficiently small to make the computa¬ 
tion of the N\ values of R practically possible, nor sufficiently large to permit the 
use of the limi ting distribution. In such cases it may be helpful to determine 
the third and fourth, and perhaps higher, moments of R, on the basis of which 
upper and lower limits for the cumulative distribution of R can be derived. 
(For a description of the Tchebycheff inequalities by which this can be done see, 
for example, Uspensky, [8], pp. 373-380.) Since the limiting distribution is 
normal it may be useful to approximate the distribution by a Gram-Charlier 
series or to employ similar methods. 


3. Mean value and variance of R. B It is clear that 
E(R) = NE(x l x t ) = N( / z i T) °“ a * 

(5) i ", 

= —-- [<oi + • • • + a*) 2 — (a* + • ‘ • + a *)]- 

A — 1 

To calculate the variance of R we first calculate the second moment of R about 
the origin. We have 

(6) E(R?) = E(xiXi -f • • ■ + x N -.iXy + XfiXi? 

= NExlxt -f 2NEx l xlx i + (N 2 - 3N)ExiX&gXi. 


* The first four moments of a similar statistic have been obtained by Young [3]. 
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To express the expected values Ex\x\ , ExiX 2 x 3 , and Ex iW< we shall introduce 
the following notations for the symmetric functions of a ± , ■ ■ , a N : For any 
set of positive integers si, i 2 , ■ • , u the symbol S,-^. denotes the sym¬ 
metric function 23«* ■ ■ • 23<r t a' a \ • ■ • a' a \ where the summation is to be taken 
over all possible sets of k positive integers a, , • ■ ■ , at subject to the restriction 
that a„ < N and a u ^ a„ (u, v = 1 , • • • , k). 

From ( 6 ) we easily obtain 

N „ , 2N 


E(R) = 


N(N - 1 ) 


S 22 + 


(7) 


Ss. 


+ 


N(N - 1 )(N - 2) 

+ 

2»Si2i 


Sl21 


N 2 - 3 N 


N(N - 1 )(N - 2 )(N - 3) 
Sun 


Sun 


+ 


(N - 1 ) 1 (N - 1)(N - 2) ' (N - 1 )(N - 2 )' 

It will probably facilitate computation to express each of the symmetric func¬ 
tions in the right member of (7) by a sum of terms, each a product of factors 
S r (r = 1 , 2 , ■ ■ •One can easily verify the relationships 


( 8 ) 

(9) 

( 10 ) 
( 11 ) 
( 12 ) 

(13) 


(14) 


S n = Si — S 2 

$15 = $u — $i$j — $3 

$11 = $31 = $ 1 $. - $1 

$25 = $J —• $4 

$ui = $n$i - 2$15 = (Si - $ 2 )$i - 2($!$5 - S 3 ) 
= $i - 3$i$j + 2$ 3 

$112 “ $121 ~ $211 = $ 11$2 — 2 $u 

= (Si - Si)Si - 2 ($j $3 - $4) 

= $1$5 - $1 - 2$l$s + 2$ 4 


$1111 = $tll$l — 3 $112 

= Si- 3$?$2 + 2 $ 1 $» - 3SlSi + 3$2 + 6 $i $ 3 - 6$4 
= Si - QSlSi + 8 $i$s + 3$“ - 6$4 . 

It follows from (5) that 


(15) 


E(R) = 


OS! - Si), 


N - 1 

and from (7), (11), (13), (14), and (15) that the variance of R is given by 


*(R) = E(R 2 ) - (F(/e)] a 

(16) _si~ $1 $; - 4$( Si -f 4$1 $3 + $2 - 2$4 


N - 1 


(N - 1 )(N - 2) 


(N - 1)’ 


(Si - S 3 ) 2 . 
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The mean value and variance of R can easily be computed from (15) and (16) 
as soon as the values of Si, S 2 , Ss, and Si have been determined, 

The formulas (15) and (16) are considerably simplified if Si ~ 0. In the 
special case that Si = 0 we have 


(15') 

and 


« s > - -a~i 


(160 


rT -(p') — ^ ^2 — 2 Si Si 

N - l T (N - 1 )(N - 2) (IV- l) 2 ' 


We can always make Si equal to zero by replacing a a by b a = a a — N~ } 2 a a . 
This substitution is permissible, since it changes the statistic R only by an addi¬ 
tive constant and consequently leaves the test procedure unaffected. Thus, in 
practical applications it may be convenient to replace a a by b a and to use formu¬ 
las (150 and (160- 


4. Limiting distribution of R. Let {a a j (a = 1, 2, • ■ • ad inf.) be a sequence 
of real numbers with the following properties: 
a) There exists a sequence of numbers Ai, A%, • • • , A r , ■ • ■ such that 


(17) 


1^ 

N 


y 

£ 

<**=•1 


< A r 


(r = 1, 2, • • • ad inf.) 


for all N. (This condition means that the moments about the origin of the 
sequence aj, a 2 , ■ ■ • , a K are bounded functions of N) 
b ) If 



then 


(18) lim inf S(N) > 0. 

(This condition means that the dispersion of the N values Oi, as, ■ • • , a N is 
eventually bounded below.) 

Let R (N) be the serial correlation coefficient R as defined in (4), where Xi , • ■ • , 
Xu is a random permutation of a\ , ct 2 , • • • , a# . We shall prove the following 
Theorem: As N <x>, the ■probability that 

r(n) - mm . t 

°(R(N)) 


y/ 2x 



- 4*3 


dir., 


approaches the limit 
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For any function /(AO and any positive function <t>(N) let 

/(AT) = 0(*(N)) 


mean that | f(N)/<j>(N) | is bounded from above for all N, and let 

w) = mm 

mean that 

f{N) = 0(4>m 

and that lim inf \f(N)/4>(N) | > 0. Also let 
x 

m) = o( v m) 

mean that 


/(AO _ 




= 0 . 


Let [p] denote the largest integer less than or equal to p. 

To simplify the proof we shall temporarily assume: 

c ) There exists a positive constant K such that, for every positive integral N, 

(19) ~K < Si = itaa<K. 

a -1 

This restriction will be removed later. 

Lemma 1: 

Proof: 22 ■ ■' 22 can be written as the sum of a finite 

" 1 < ■■<<<* 

number of terms where each term is a product of factors S r (r = 1, 2, • • • ). 
This representation will be called the normal representation of £2 • * ■ £ a «i ' “ 
a ak . Since Si = 0(1) by (19) and S r = O(N) by (17) and since the number of 
factors S r (r > 1) in a Bingle term of the normal representation of T2 ■ • • 22 
■ • • a„ t is at most [Jfc], the equation 52 • • • 52 a «i ■ • • = 0(N lM ) must 

hold. 

Lemma 2: Let y — X\ ■ > • x^z, where z = x ' k \i ■ • • x*+ r andij > 1 (j *p 1, • • ■ , r). 
If {xi , x N ) is a random permutation of Oi, • • • , an , and if k, r, ii , ■ ■ • , i r 
are fixed values independent of N, then E(y ) = 0(N lw ~ h ). 

Proof: Let E(y | x k +i, • ■ • , x k+r ) be the conditional expected value of y when 
acjfe + i, • ■ • , x k+r are fixed. It follows easily from Lemma 1 that 

E(y | z* + i, , x i+r ) - 0(N m ~ k ). 

Hence also E(y) = 0(N^ k] ~ h ) and Lemma 2 is proved. 
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Denote x a x a+1 by y a (a = 1, ■ ■ •, N — 1) and x N Xi by y N , and consider the 
expansion of (yi + ■ • + y^f. Let y be a term of this expansion, i.e., y = 

__ y* - - y' a \ y'a\ (a 1 < tf 2 < ■■ ■ < a u ). We will say that two factors y a 

fi! • • * iu^ 

and yp are neighbors if j a — + 1 ( or | a — /3 — 11 is either 0 or N. The set of 

u factors y ai , • ■ ■ , y au can be subdivided into cycles as follows: The first cycle 
contains y ai and all those y a which can be reached from y ai by a succession of 
neighboring y a . The second cycle contains the first y a of the remaining se¬ 
quence and all those which can be reached from the first y a by a succession of 
neighboring y a . The third cycle is similarly constructed from the remaining 
sequence, etc. After a finite number of cycles have been withdrawn the sequence 
will be exhausted. If m is the number of such cycles we will say that y has m 

cycles. _ v . 

Lemma 3: Let y be a term of the expansion (x\Xi + • ■ • + x N xi) - (J/i + ■ • • 
+ VkY (t fixed). Let m be the number of cycles in y and k be the number of linear 
factors in y if y is written as a function of x i, • • • , x* (i.e., if we replace y a by 
x„x a+ i). Then the maximum value of m + — k is equal to ftr]. 

Proof: First we maximize m -f- [%k] — k with respect to k when m is fixed. 
If m < [\r], then the minimum value of k is obviously zero. Let m = \\r] + r 
(r 1 > 0). ■ The minimum value of k is reached if each cycle consists of a single 
factor y« and if each factor y a in y is either linear or squared. If r is even, then 
the minimum value of k is &r‘ and if r is odd then the minimum value of k is 
4 / — 2. Hence for m =[^r] r’ we have 

max (m + [£&] - k) = [\r\ - r> if r is even 
ft 

and 

= [|r] — r' + 1 if r is odd. 

Hence maximizing with respect to m and k we obtain 

max (m + [$&] — k) — ftr], 


and Lemma 3 is proved. . 

Lemma 4: The expected value of the sum of all those terms in the expansion o] 
( XlX2 + • ■ • + x N xi) r for which m is the number of cycles and k Renumber of linear 

factors (if y is expressed in terms of Xi , • • * , is equal to 0(N ■ 

This Lemma follows from Lemma 2 and the fact that the number of terms y 

with the required properties is 0(N ). 

Lemma 5: 

E(xiX} + ■ ■ • + xsxff — 0(N li *). 


This follows from Lemmas 3 and 4. 
Lemma 6 : If r is even then 


E(xiXi + - • • + xtjx i) r — 
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Proof: It follows easily from oui considerations in proving Lemma 3 that 
m + [W ~ k < 5 ?- for all teims in the expansion of (x x x 2 + • - + a N .'C l ) r which 


are not obthe type xl ■ ■ 


Hence it follows from Lemma 4 that the expected 


value of the sum of all those terms in the expansion of [x x x 2 + ■ • • + x K XiY 
which arc not of the type x\ • ■ • x r is equal to o(N ir ) Lemma 6 follows from the 
fact that 2 _4 V! is the coefficient of the terms of the type xl ■ • ■ xl in the expansion 


Of (®i*j + 

Cf r . 


+ X/fXif and that the number of terms of such type is equal to 


E(x iXt + ■■■ + XifXiY 


J N = 0 if r is odd and = 2~ jr ; 

+ X N XyY\' r 


V!/(ir)l if 


Lemma 7. Lim r = 7 - 

n->«, [EixiXi ■+• 

r is even. 

Proof: From Lemma 6 it follows that 

(20) E(x x xi + • • • + x N x x ) 1 = NE(x\x\) + o(N) - a(N). 

The first half of Lemma 7 follows from Lemma 5 and equation (20), If r 
is even then it follows from (20) that 

lim _ E(x x x 2 + 

( 21 ) 


[E(x 1*2 + 


+ x„xQ r _ y 2' 4r Cf r rl E(x\ 

+ XsXlf}* at -.* 


= lim 


N v (ExlxlY r ' 
t\ E(x\ - • • xl) 


xl) 


2 ir Qr)l (E(x{x$)) ir 


It follows fiom (17), (19), and the normal representation of symmetric func¬ 
tions that 


fc! X ••• X al, 


al h - flS + 0(N k ' 1 ). 


From (17) and (18) we have <S 2 = fl(JV), Since 

al r )[N(N - 1) 


E{x\ 


®?) = rl(X ■ ■ ■ X 1 


(N-r + I)]" 1 , 


we obtain 

( 22 ) 


eqA ■■■xl) 
ir—" (E(xl xl)) ir 


= 1. 


The second half of Lemma 7 follows from (21) and (22). 
Lemma 8: 


(23) 

(24) 


lim 

W“*OQ 


E(R(N)) 

«(m)) 


= 0, 


lim 

N —»oo 


E(R\N)) 

«•(«(#)) 


1. 


Proof: Equation (24) is a trivial consequence of (23). From (15) E(JR) = 
0(1) and from (16) a-(R) = f2(iV*). The lemma follows easily from these rela¬ 
tions. 
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Proof of the Theorem . According to Lemma 7 the r-th moment of R[E (fif)] - * 
approaches the r-th moment of the normal distribution as N —> °o, From this 
and Lemma 8 the required result follows if condition ( c ) holds. It remains 
therefore merely to remove condition (c) Assume now only that m , a*, ■ • , 
a a , ■ satisfy conditions (a) and (b). 

R(N) is formed from the population of values ai, at, ■ ■ • , a N . Addition of 
a constant q to ch , • ■ ■ ,a N adds the same constant to all the values of R(N) and 
hence leaves [R(N) - E(R(N))]/<r{R{N)) unaltered. Let q m be aJN 

and write b { a N) = a„ + £ <m - Consider the sequences 

B U) == b { i\ bi'\ ■■■ , bi° (i - 1, 2, • ■ * , ad inf.). 

From (17) it follows that the | q m | are bounded for all N. Hence the se¬ 
quences J3 (,) satisfy condition (a). They obviously satisfy condition (c). Since 
S(j) is invariant under addition of a constant we have 

hmmi l ,(i (b^f -U±b^)> 0, 

1 J \«=1 J \a-l / / 

so that the B (i) satisfy condition (b). Since [R(N) — E(R(N))]/o(R(N)) has 
the same distribution in the sequence a : , a 2 , ■ ■ , a N as in the sequence B , 
the theorem follows. 

It should be remarked that the theorem remains valid if conditions (a) and 
(b) are replaced by the weaker condition 

Mr/ Vf r = 0(1) (r = 3, 4. ■ • • , ad inf.) 


where 


Mr 


-»£(‘-*£*■)'• 


This follows easily from the fact that [R{N) — E(R(N))]/<r(R{N)) remains un¬ 
altered if we replace the sequence <h , • • • , ay by the sequence c? , c 2 , • • • , c N 
where 

c “ = ( a ° - W ? a “) / s ( a “" ^ a “) ] * 

Conditions (a) and ( b ) are obviously satisfied by the sequence c x , ■ ■ ■ , c N . 


6. Transformation of the original observations. 

Let/(t) be a continuous and strictly monotonic function of t{- » < t < +«). 
Suppose we replace the original observations <n , ■ • • , ay by di, ■ • ■ , dy , where 
d a = /(ct„) (a = 1 , ■ • N). We obtain a valid test of significance if we carry 

out the test procedure as if ck, ■ • ■ , dy were the observed values instead of 
ai, ■ • • , ay . We could also replace the observed values a t , ■■ ■ ,a„ by their 
ranks. The question arises whether there is any advantage in making the test 
on the transformed values instead of on the original observations. It may well 
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be that by certain transformations we could considerably increase the power of 
the test with respect to alternatives under consideration. This problem needs 
further study. 

6. Summary. A test procedure based on Berial correlation is given for testing 
the hypothesis that ®i, • • • , *«■ are independent observations from the same 
population, i.e., that Xi , • ■ ■ , x# is a random series. By considering the dis¬ 
tribution of the serial correlation coefficient in the aubpopulation consisting of 
all permutations of the actually observed values a test procedure is obtained 
such that 

a) if the common c.d.f. F(x) is continuous, the size of the critical region, 
i.e., the probability of rejecting the hypothesis of randomness when it is true, 
does not depend upon Fix), 

b) if F(x) is not continuous but all its moments are finite and its variance is 
positive, the size of the critical region approaches, as N —*■ <», the value it 
would have if F(x) were continuous. Thus in the limit an exact test is pos¬ 
sible in this case as well. 

It is shown that the teat based on the serial correlation with lag h is equivalent 
to the test based on the statistic 6 

N 

X a $h+ct 

where x\ +a is to he replaced by x^a-n for all values of a for which h -f- a > N, 
If h is prime to JV, the distribution of Xi is exactly the same as the dis¬ 
tribution Of R - Xl • 

The mean value and variance of R are given by the following expressions: 
E(R) = (Si - S 2 )/(N - 1) 

and 

J /r>\ _ tSl — Si , Si - 4Si St + 4iSiS8 + $1 — 2Si (Si — Si) 1 

* { ) JV - 1 ^ (JV — l)(N - 2) (N — 1)> 

where = x[ + • • • + x r N . 

It is shown that under some mild restrictions the limiting distribution of R is 
normal. The test procedure can therefore be easily carried out when N is 
sufficiently large to permit the use of the limiting distribution of R. 
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ON A GENERAL CLASS OF “CONTAGIOUS” DISTRIBUTIONS 

By W. Feller 
Brown University 

1. Introduction. In a paper of considerable interest, J. Neyman [11] recently 
discussed frequently occurring situations where the usual tests of significance 
fail. He discussed, in particular, experiences in entomology and bacteriology 
which cannot be described by the usual distribution functions and he constructed 
several new types of apparently contagious distributions. Now at first glance 
Neyman’s investigation may seem of a rather specialized nature, and his distri¬ 
butions of a restricted applicability. It may therefore be useful to point out 
that they are intimately related to results obtained by various authors in con¬ 
nection with topics having so little apparent relation as accident statistics, tele¬ 
phone traffic, fire damage, sickness- and life-insurance, risk theory, and even an 
engineering problem. Viewed in the proper light of a general theory, Neyman’s 
method is particularly closely related to some too little known considerations by 
Greenwood and Yule [6]. These authors were the first to find, and apply, the 
distribution which shortly afterwards was independently rediscovered by Eggen- 
berger and Polya 1 [3,4]. 

Greenwood and Yule discussed two types of what may conveniently be called 
contagion: with one type there is true contagion in the sense of Polya and Eggen- 
berger, where each “favorable” event increases (or decreases) the probability 
of future favorable events; with the second type the events are, strictly speak¬ 
ing, independent and an apparent contagion is actually due to an inhomogeneity 
of the population. The two explanations are very different in nature as well as 
in practical implications. It is therefore most remarkable that Greenwood and 
Yule found their distribution assuming an apparent contagion; in their opinion 
this distribution contradicts true contagion. On the contrary, Polya and Eggen- 
berger arrived at the same distribution assuming true contagion, while the possi¬ 
bility of an apparent contagion due to inhomogeneity seems not to have been 
noticed by them. The Greenwood-Yule-Polya-Eggenberger distribution has 
found many applications. 2 Therefore the possibility of its interpretation in two 
ways, diametrically opposite in their nature os well as in their implications is of 
greatest statistical significance. This fact is, incidentally, a justification for 
general theories in statistics. 

We shall see that Neyman’s contagious distributions belong to the second 
type and are related to the Polya-Eggenberger distribution only if the latter is 

1 The fact that the Polya-Eggenberger distribution is identical with the Greenwood-Yule 
distribution seems to be mentioned in the literature only in a Stockholm thesis by 0. Lund- 
berg [9]. 

! Of quite recent applications we mention Kitagawa and Huruya [8], Rosenblatt [15], 
0. Lundberg [9]. Only the latter seems aware of the double nature of the distribution, 
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interpreted in the sense of Greenwood and Yule. In Neyman’s ease as well as 
in the other cases referred to above we are concerned with inhomogeneous popu¬ 
lations and there exists an extremely simple device to describe such situations 
appropriately. Once stated, this device will appear trivial. Nevertheless, a 
straightforward application of it would have avoided considerable mathematical 
difficulties in the literature and, occasionally, yielded better and simpler results. 

It seems also the simplest description of the mechanism behind many observed 
distributions, and therefore suited for a theory of tests 8 . 

To start in a purely formal manner, consider an arbitrary cumulative distri¬ 
bution function (c.d.f.) Fix, a), depending on a parameter a, and another c.d.f. 
V{a). Then 

(1.1) Gix) = I F(x, a) dU(a) 

(the integration extending over the domain of variation of a) is again a c.d.f. 
If, in particular, U(a ) is a step function, (1.1) reduces to 

(1.2) G{x) = 2p,F(x, a,), 

where pi is the weight attached to a< (we have, of course, p, > 0, = 1). 

Instead of (1.2) one can write more simply 

(1.3) G(x) = 2p<F t (x), 

where the Fi(x) are arbitrary c.di.’s. Of course, F(x, a) and 17(a) may depend 
on additional parameters, and the procedure can be repeated. 

The statistical meaning of (1.3) is clear. Consider a population made up of 
several subgroups Ai , Ai, • • • , mixed at random in proportions pi'.pt' • ■ • . 
If Fi(x) is the c.d.f. of some character m At , then G(x), as defined by (1.3), will 
represent the c.d.f. of that character in the total population, provided that the 
subgroups ri; are statistically independent. Similarly (1.1) describes an infi¬ 
nitely composite population. Postponing a discussion of the property of con¬ 
tagion to the last section, we shall first deduce a few properties of the compound 
Poisson-distribution, considered first by Greenwood and Yule. Neyman’s 
“Contagious Distributions of Type A” as well as the Polya^Eggenberger distri¬ 
bution belong to this class. Our next example of a special case of (1.1) is what 
F. E. Satterthwaite [16] called the “Generalized Poisson Distribution.” It has 
been independently discovered by many authors and represents heterogeneity 
of quite different a nature, Instead of further examples we shall, in the fourth 
section, show Eow r Neyman’s most general contagious distribution can be de¬ 
duced by a repeated application of (1.1). 

* Incidentally, attention may be drawn to an argument by Greenwood and Yule showing 
that the x J -test when applied to the Poisson distribution is biased and tends to exaggerate 
the goodness of fit. Tho argument could be amplified from other experience. 
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Notation.. If F(x) and G(x) are the e d.f.’s of two independent variates X and 
F, then their convolution, (that is to say the c.d.f. of X + Y) will be denoted bv 
F{x)*G(x). Thus 


(1-4) F(x)*G(x) = [ F(x — y) dG(y). 

More particularly we shall write 

(1.5) F(x)*F(x) = F 2 \x), 

F n \x)*F(x) = F in+n \x). 

We shall denote by E(x) the unitary c.d.f. 


( 1 . 6 ) 


B(x) 


j 0 for x < 1, 


(1 for x > 1, 
so that = 0 for x < n, and 1 for £ > n. 


2. The compound Poisson distribution. Consider the well-known Poisson 
expression 

(2.1) ir(n; a) = 

where the parameter a > 0 gives the expected number of “events”. We shall 
refer to (2,1) as the smyle Poisson distribution. If different individuals of a 
population are associated with different values of a, and if the character a is 
distributed according to the cumulative probability law U(a), the probability 
of n events in the total population will be given by 

(2.2) t„ = T e~ aC L.dU(a). 

Jo nl 

Following Greenwood and Yule we shall refer to (2 2) as the compound Poisson 
distribution. Referring for an interpretation to the last section, we first con¬ 
sider a few special cases. 

a) If U (a) is a step function we are led to expressions of the form 

(2.3) tt„ = — X) Vi e ~ ai a r. 

Such a distribution has been successfully applied by C. Palm [12] to problems of 
telephone traffic, and by 0. Lundberg [9] to sickness statistics. 

b) If U(a) is a Pearson Type III distribution 
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(with d > 0, h > 0), then 



This is the Polya-Eggenberger distribution in its usual form, and has in this form 
(with a slight change of notations) been derived by Greenwood and Yule, 
c) If a takes on the values kc only, where c > 0 is a constant and k = 0, 
and if a is distributed according to the Poisson law 


( 2 . 6 ) 

then 

(2.7) 


Prob ja = kc] 



tt„ = e* Z (e“°X)\ 
n\ *_a k 1 


This is Neyman's contagious distribution of type A depending on two parameters 
(cf. section 4). If, instead, a is distributed according to a multiple Poisson law 
of form (2.3) we arrive at Neyman’s more-parametric distribution of type A. 
They are, of course, essentially linear combinations of expressions of form (2.7). 

It follows from the theory of Laplace transforms that two compound Poisson 
distributions associated with different c.df.’s 17(a) are never identical. 

The compound Poisson distribution gives a simple explanation of a phenome¬ 
non recorded by Neyman and observable in many instances. In the experi¬ 
ments described by Neyman “the attempts to fit the Poisson Law • • • failed 
almost invariably with the characteristic feature that, as compared with the 
Poisson Law, there were too many empty plots and too few plots with only one 
larva”. It is easily cheoked in the literature that similar situations arise fre¬ 
quently. Now the Poisson distribution is usually fitted by the method of 
moments. Accordingly, the compound Poisson law (2.2) ought to be compared 
with the simple Poisson distribution with the same mean value. The mean 
value of (2.2) is 

(2.8) m = f adV(a), 

JO 

so that (2,2) ought to be compared with the Poisson distribution 7r(n; m). Now, 
whatever the o.d.f. 17(a), we have always 

(2.9) Wo > t(0, m) 
and 

(2.10) -<m = 

to t ( 0 , m) 
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As a matter of fact, using Lagrange’s form for the remainder in Taylor’s for¬ 
mula, we have 


( 2 . 11 ) 


»o = e~ m f e m -“ dU(a) , 

Jo 


> j[ (1 + (m - a)\ dU(a)) = «— - ,(0, m), 
which proves (2.9), Similarly 

mir 0 - 7 ti = e - " f e m-a (wj. - o) dU(o) 

JO 

> e _m f (m — a) dU(a) = 0, 

Jfl 


( 2 . 12 ) 


which proves (2.10). 

The above theorem shows that, whenever the material under observation is not 
quite homogeneous so that the compound Poisson law applies instead of the simple 
one, there will be too many cases with “no event ” and, as compared with these cases, 
too few with “one event". It should be noticed, however, that it is not strictly 
true that always 


(2.13) 


xi < ir(l, m). 


As a matter of fact, even in the numerical example given by Neyman, the com¬ 
puted value 7n exactly equals the observed value. Still, the inequality (2 13) 
will hold whenever the third moment about the mean of U(a) is smaller than 
twice the second. Writing 


(2.14) 


<r 2 = f (a- m) 2 d17(a), 

JO 

M = [ (a — to) 3 dU(a), 

Jo 


and using two more terms in the Taylor development of e m “ than in (2.11) and 
(2.12) we see that 

(2.15) iro > e -1 " |l + ~ 

and 


(2.16) 


mn — ti > e~ m {a — |M). 


These inequalities are slightly sharper than (2,9) and (2.10), and often permit us 
to estimate the variance of 17(a). 

We note furthermore that the variance of the compound Poisson distribution is 


( 2 . 17 ) 


V + m 
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as compared with the variance m of the corresponding simple Poisson distribution. 
Finally the following important property of the compound distribution may be 
mentioned: Consider two independent variates X and Y distributed according to 
two compound Poisson distributions {ir^j and (x* 1 ) associated with the cd.f.’s 
U\(a ) and TJi(a ), respectively. Then the variate X + Y is distributed according to 
a compound Poisson law [x n } associated with the c.d.f. XJ(a) = Ui(a)*U 2 (a) 
(cf- (1.4)). 

It suffices to note that Ui{a) = 0 for o < 0, so that 

17(a) = f Ui(a - s) iI7,(a); 

“0 


therefore, after a permitted change of the order of integration 

x„ = f f-.dV (a) 

Jo n\ 

= f dU 2 (s) r e~“—.dUi(a - s) 

Jo Til 

= f dUi(s) [ e- (,+t> dUi(t) 

Jo Jo Til 


n 1 
= yl 

^ i-1 


k -o 


k 1 (n — k) 1 


u) «) . 

IT* ITn-k , 


the last expression represents the convolution of {x$, l) } and {xS, a) (. 

Neyman’s distributions of type A with two parameters are special cases of a 
compound Poisson process where [/(a) is a step function with jumps at equidis¬ 
tant places, the jumps being given by a simple Poisson distribution jx(n; X)). 
Now the convolution of two such distributions is again a simple Poisson distribu¬ 
tion (x(n; 2X) [ with jumps at the same places; hence the convolution of two 
distributions of type A is again a similar distribution with one parameter doubled. 

As mentioned before, the notion of a compound Poisson distribution is due to 
Greenwood and Yule [6], The time dependent compound Poisson process has 
been the object of detailed investigations by J. Dubourdieu [2] and 0. Lundberg 
[9], The latter has discussed also the problem of fitting the compound Poisson 
process to empirical distributions. 


3. The generalized Poisson distribution. Let F(x) be an arbitrary c d f. 
Then its n-fold convolution F n, (x) (cf. (1.6)) may be considered as a c.d.f. 
depending on a parameter n. Choosing, for the latter, the simple Poisson dis¬ 
tribution (2.1) and performing the operation indicated in (1.1), we arrive at the 
c.d.f. of the generalized Poisson law 
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If, m particular, F{x) is the unitary function (1.6), we have the ordinary Poisson 
law 

(3.2) n c*) - i: « _a E n \x) = £ ^ 

n-o n‘ n=0 n\ 

in its cumulative form. 

The most frequently encountered application of the generalized Poisson dis¬ 
tribution is to problems of the following type. Consider independent random 
events for which the simple Poisson distribution may be assumed, such as. 
telephone calls, the occurrence of claims in an insurance company, fire accidents, 
sickness, and the like. With each event there may be associated a random 
variable X. Thus, in the above examples, X may represent the length of the 
ensuing conversation, the sum under risk, the damage, the cost (or length) of 
hospitalization, respectively. To mention an interesting example of a different 
type, A. Einstein Jr. [5] and G. Polya [13,14] have studied a problem arising out 
of engineering practice connected with the building of dams, where the events 
consist of the motions of a stone at the bottom of a river; the variable X is the 
distance through which the stone moves down the river. 

Now, if F(x) is the c d.f of the variable X associated with a single event, then 
F n \x) is the c.d.f. of the accumulated variable associated with n events. Hence 
(3.1) is the probability law of the sum of the variables (sum of the conversation 
times, total sum paid by the company, total damage, total distance travelled by 
the stone, etc.). 

In view of the above examples, it is not surprising that the law (3.1), or special 
cases of it, have been discovered, by various means and sometimes under dis¬ 
guised forms, by many authors. Quite recently Satterthwaite [16] was led to it 
(in the above simple form) from problems in insurance. Related (but less ele¬ 
gant) considerations may be found in a paper by W. G. Ackermann [1], Simple 
as they are, the above considerations leading to (3.1) furnish a complete solution 
of the problem in all the cases mentioned. Unfortunately, the special features 
of the problems often so overshadow the essential point, that one is often led to 
unnecessarily complicated and incomplete solutions. As an example of the diffi¬ 
culties in considering special cases we mention that Polya [13, 14] was led to a 
partial differential equation of the hyperbolic type, which conceals the elementary 
nature of the problem. 

If F{% ) is itself a Poisson c.d.f. (3.1) reduces to (2.7) Thus Neyman’s distribu¬ 
tion of type A depending on two parameters is both a compound and a generalized 
Poisson distribution. We shall later on see that the generalized Poisson distri¬ 
bution plays an even more important r61e m Neyman’s theory 

The main properties of (3.1) are easily derived using characteristic functions. 
If <p(z) is the characteristic function of F(x), the characteristic function of G(x) 
is 

(3.3) Kz) = e aM ' ) ~ 1) . 



